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INTRODUCTION 


The Institute for Computer Applications in Science and Engineering (ICASE)* is operated at the Langley 
Research Center (LaRC) of NASA by the Universities Space Research Association (USRA) under a contract 
with the Center. USRA is a nonprofit consortium of major U. S. colleges and universities. 

The Institute conducts unclassified basic research in applied mathematics, numerical analysis and algo- 
rithm development, fluid mechanics, and computer science in order to extend and improve problem-solving 
capabilities in science and engineering, particularly in the areas of aeronautics and space research. 

ICASE has a small permanent staff. Research is conducted primarily by visiting scientists from univer- 
sities and industry who have resident appointments for limited periods of time as well as by visiting and 
resident consultants. Members of NASA’s research staff may also be residents at ICASE for limited periods. 

The major categories of the current ICASE research program are: 

• Applied and numerical mathematics, including multidisciplinary design optimization; 

• Theoretical and computational research in fluid mechanics in selected areas of interest to LaRC, 
such as transition, turbulence, flow control, and acoustics; and 

• Applied computer science: system software, systems engineering, and parallel algorithms. 

ICASE reports are primarily considered to be preprints of manuscripts that have been submitted to 

appropriate research journals or that are to appear in conference proceedings. A list of these reports for the 
period October 1, 1998 through March 31, 1999 is given in the Reports and Abstracts section which follows 
a brief description of the research in progress. 


* ICASE is operated at NASA Langley Research Center, Hampton, VA, under the National Aeronautics and Space Adminis- 
tration, NASA Contract No. NAS1-9T046. Financial support was provided by NASA Contract Nos. NASl-97046, NASl-19480, 
NAS1-18605, NAS1-18107, NAS1-17070, NAS1-17130, NAS1-15810, NAS1-16394, NAS1-14101, and NAS1-14472. 
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RESEARCH IN PROGRESS 


APPLIED AND NUMERICAL MATHEMATICS 


BRIAN ALLAN 

Closed-loop Separation Control Using Oscillatory Flow Excitation 

Experimental results have shown that oscillatory blowing, introduced upstream of a separated boundary 
layer, can effectively delay boundary layer separation. This method for separation control can be used for 
several different flow control problems. Some of the current research areas of separation control are high lift 
enhancement and maneuver control. This project will develop a feedback controller which will control the 
amount of separation in the boundary layer. The feedback controller designed will then be used in a wind 
tunnel test on an airfoil model with oscillatory blowing. 

Currently we have a design methodology for the feedback controller using a robust control design method. 
The controller is designed to track a desired pressure gradient in the separated boundary layer. We will not 
know what the dynamics of the flow system are until the wind tunnel tests are conducted. When the 
wind tunnel tests start, we will be able to get an accurate model of the system. This model will then be 
used in our current controller design methodology. We are also building a hardware interface for the flow 
experiment which will provide the feedback control to the experiment. This hardware interface is designed 
to be transferable to other flow control experiments at NASA Langley. 

This control design method and hardware interface are scheduled to be tested on a future wind tunnel 
test. The hardware interface and control design experience gained from this project, will be transferred to 
other flow control experiments at NASA Langley. 

This research was conducted in collaboration with Jer-Nan Juang (NASA Langley), David Raney (NASA 
Langley), and Avi Seifert (NRC). 

EYAL A RIAN 

Approximations of the Newton Step for Large-scale Optimization Problems 

Quasi-Newton methods for large-scale optimization problems are powerful but suffer an initial slow 
convergence rate. Our goal is to develop a new iterative method, for the solution of large-scale optimization 
problems, that will allow a better approximation for the Newton step right from the first optimization steps. 

In the course of the optimization process, systems of linear equations are constructed that contain the 
linearized state operator and its adjoint. These have to be solved at each iteration to achieve convergence 
of the iterates to the Newton step. We are investigating a defect- correction method to solve these systems 
of equations for highly ill-conditioned problems with many design variables. Preliminary numerical tests on 
the potential small disturbance shape optimization problem are promising. 

Our plan is to further investigate the above method for applications that are governed by nonlinear 
equations. This approach can be naturally embedded in a SQP formulation of the problem. 

This research was conducted in collaboration with A. Battermann and E. Sachs (Universitat Trier, 
Germany). 
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Large-scale Aerodynamic Shape Optimization 

The purpose of this work is to develop and apply algorithms which do not require more than a few full 
solutions of the flow equations to obtain the optimum. 

Our approach is to apply approximations in the PDE level to the numerical solution of a practical 
large-scale optimization problem. We are working on shape optimization of a 3D geometry using TLNS3D. 

This research was conducted in collaboration with V. Vatsa (NASA Langley). 

H.T. BANKS 

Electromagnetic Interrogation of Structures 

The detection and characterization of subsurface damage (cracks, internal corrosions, etc.) is an impor- 
tant problem in aging structures such as airfoils, etc. In collaboration with scientists in the Nondestructive 
Evaluation Branch at NASA, we are developing computational techniques for inverse problems involving 
electromagnetic interrogation of structures using superconducting quantum interference devices (SQUIDs). 

Our approach is to develop reduced-order model computational methods for Maxwell’s equations in a 
dielectric medium to be used in inverse algorithms. To date, we have developed models based on eddy current 
interrogation of structures. These models are being tested using a full Maxwell solver in ANSOFT which 
computes time- dependent fields in terms of a vector magnetic potential A in phaser form. Our reduced-order 
methods are based on Karhunen-Loeve or Proper Orthogonal Decomposition (POD) methods. 

We have made significant progress on the modeling and computational aspects of this problem and are 
currently testing our ideas with simulated SQUID data for model verification and assessment of the ability 
to identify and characterize damage geometries in a structure. 

BORIS DISKIN 

Efficient Methods for Solving Upwind-biased Discretizations of Advection Equation 

The efficiency of methods for solving the advection equation is extremely important in devising solvers 
for complicated computational fluid dynamics problems. Frequently, the overall convergence rate of a sophis- 
ticated solver is determined by the convergence in a build-in algorithm solving the advection equation. The 
simplest way to solve the advection operator is to employ downstream marching. If the corresponding dis- 
cretization is a stable upwind discretization and the field of velocities does not recirculate, then this marching 
proves to be a very efficient solver yielding an accurate solution to a discretized nonlinear advection equa- 
tion in just a few sweeps (a single downstream sweep provides the exact solution to a linearized problem). 
However, if a discretization of the advection operator is not fully upwind (e.g., only upwind- biased) then 
marching in its pure form is inapplicable and other solution methods should be employed. In this period, we 
systematically studied two methods for solving upwind-biased discretizations of the advection operator: the 
defect-correction method and the multigrid method using semicoarsening. This research was motivated by 
the search for an explanation of convergence properties of an existing full Euler system solver and also by 
the wish to extend the range of available advection solvers taking into account parallelization perspectives. 

The defect-correction and multigrid methods have been analyzed in application to discretized advection 
equations corresponding to flow at some angle of attack to a uniform Cartesian grid. We have developed a 
novel comprehensive mode analysis. This analysis predicts the convergence rate for each iteration and the 
asymptotic convergence rate. On the base of this analysis, we have explained many surprising details observed 


2 



in numerical calculations (e.g., establishment of a good asymptotic convergence rate after many poorly 
converging defect- correction iterations and fast convergence in a multigrid cycle employing semicoarsening 
far surpassing the theoretical limit predicted for standard multigrid algorithms using full coarsening). It 
has been found, analytically and experimentally, that the convergence properties of the defect-correction 
iterations are grid-dependent. The number of iterations required to converge algebraic error below the 
truncation error, level might grow on fine grids as a negative power of the mesh size. On the contrary, the 
efficiency of the multigrid algorithm does not deteriorate with increasing the cycle depth (number of levels) 
and/or refining the target-grid mesh. This multigrid method uses colored relaxation schemes on all the grids 
and, therefore, is very attractive for parallel computing. A new very efficient adaptive multilevel approach 
to deriving a discrete solution approximating the true continuous solution within a given relative accuracy 
is developed. This approach was tested for both the defect- correction and multigrid methods. 

As an additional option, we are going to analyze the predictor- corrector method for solving upwind- 
biased discretizations. We also plan to implement some of the proposed ideas in the framework of the 
existing 3D full Euler system solver. 

This research was conducted in collaboration with J.L. Thomas (NASA Langley). 

JAN S. HESTHAVEN 

Well-posed Perfectly Matched Layers for Advective Acoustics 

The ability to simulate accurate wave phenomena is important in several physical fields, e.g., electro- 
magnetics, ambient acoustics, advective acoustics associated with a mean flow, elasticity, and seismology. 

Often the numerical simulations of such problems, due to limited computing resources, must be confined 
to truncated domains much smaller than the physical space over which the wave phenomena takes place. In 
such cases, numerical reflections of outgoing waves from the boundaries of the numerical domain can falsify 
the computational results. This artifact limits the overall order of accuracy of the algorithm used in the 
computation. This is particularly troublesome in cases where higher-order of accuracy is required by mode 
resolution, storage availability, etc. 

Utilizing a mathematical framework created for the development of perfectly matched layer (PML) 
schemes within computational electromagnetics, we have developed a set of strongly well-posed PML equa- 
tions for, the absorption of acoustic and vorticity waves in two-dimensional convective acoustics under the 
assumption of a spatially constant mean flow. 

A central piece in this formulation is the development of a variable transformation that conserves the dis- 
persion relation of the physical space equations. The PML equations are given for layers being perpendicular 
to the direction of the mean flow as well as for layers aligned parallel to the mean flow. 

The efficacy of the PML scheme has been tested by solving the equations of acoustics using a fourth-order 
scheme, confirming the accuracy as well as stability of the proposed schemes. 

The development of a PML for the three-dimensional equations of acoustics is straightforward provided 
only that the mean flow can be considered spatially constant. Of equal importance, however, is the develop- 
ment of PML methods for problems involving smoothly varying mean flows, as in boundary layers and jets. 
While the mathematical tools developed so far certainly are applicable for sufficiently smooth variations, 
new developments are most likely needed to address the general variable coefficient problem and we hope to 
address these questions in the near future. 

This research was conducted in collaboration with S. Abarbanel (Tel Aviv University) and D. Gottlieb 
(Brown University). 
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MICHAEL LEWIS 


Pattern Search Methods for Nonlinear Optimization 

Pattern search methods for nonlinear optimization have a number of features that make them attractive 
for use in engineering optimization. These methods are easy to understand and implement, are scalably 
parallel, and neither require nor estimate derivatives. 

We have developed pattern search algorithms for general nonlinear ly constrained optimization guaranteed 
to possess first-order stationary point convergence. We are presently engaged in an implementation of the 
new classes of pattern search algorithms. This new implementation will allow us to investigate various 
algorithmic approaches, as well as opportunities for improved computational parallelism. 

Among the algorithmic approaches we will investigate are techniques to improve scaling in pattern search 
algorithms via the aggregation of similarly scaled design variables. We will also investigate opportunities for 
algorithmic steering in connection with pattern search algorithms. 

A Posteriori Finite Element Bounds for Sensitivity Calculations 

In the optimization of systems governed by differential equations one would like to use to coarsest mesh 
possible at any given step so as to reduce the cost of the optimization iteration. In a recent series of papers, 
Patera, Peraire, and their collaborators have presented an a posteriori approach to computing quantitative 
bounds on the mesh dependence of certain functionals of the solutions of differential equations. We have 
begun to apply these ideas in the context of optimization. 

We have developed a posteriori bounds for sensitivities of output linear functionals with respect to 
various parameters (such as coefficients) in boundary- value problems. Using either the sensitivity equations 
or adjoint equations one can write the output’s sensitivity as a functional of the solution of a system of 
differential equations. One then computes bounds on the error in the sensitivities on a coarse grid relative 
to a finer grid. Numerical results indicate that the bounds can be quite good. We have also extended the a 
posteriori bound approach to certain non-smooth functionals. 

We are currently investigating extensions of the bound procedure to more complex equations and output 
functionals. We are also implementing an approach to using the a posteriori bound procedure in connection 
with pattern search methods, a first step in a larger investigation of using approximate function values with 
error bounds in optimization. 

This research was conducted in collaboration with Tony Patera and Jaime Peraire (Massachusetts In- 
stitute of Technology) . 

Analysis of Hessians in Parameter Estimation Problems Governed by Differential Equations 

Parameter estimation problems in systems governed by differential equations arise frequently in non- 
destructive evaluation and materials characterization. Similar problems also arise in design optimization. In 
both cases it is useful to understand the analytical nature of the resulting optimization problems. 

We have completed a preliminary study of the Hessians of the objective for a class of optimization 
problems that arise in design and in parameter estimation. We have established how, in many instances, 
the Gauss-Newton approximation of the Hessian may prove to be very much in error when compared to the 
complete Hessian. 

Future work includes extending the analytical approach to more complex equations, and further investi- 
gation of the consequences of this analysis for numerical computation, particularly for quasi-Newton updates 
and preconditioning. 
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JOSIP LONCARIC 

Spatial Structure of Optimal Flow Control 

Designing distributed control systems begins with the sensor/actuator placement problem. While in 
some situations discrete search of combinatorial complexity seems unavoidable, continuum problems suggest 
solving a related question. If one could sense everything and actuate everywhere f what should one do? The 
answer to this question has polynomial complexity (of order TV 3 where N is the number of state variables) 
and can serve as the initial effectiveness filter capable of rejecting a large portion of the design search space. 
This favorable situation can have several causes depending on the base flow pattern. Our aim is to develop 
efficient numerical procedures to solve this problem for flows in moderate Reynolds number regimes. 

In an earlier work, we developed a rational approximation of the optimal feedback kernel for unsteady 
Stokes flow. For the flow around a cylinder, this approximation was proven to perform within 0.026% of the 
exact optimum even in the worst case. Using the vorticity representation in conformally mapped geometries, 
this approximation is decomposed into the analytic free space solution and a boundary term which can be 
evaluated numerically. This procedure was applied to the NACA 0015 wing. The results demonstrate a 
significant contribution of the boundary to the control effort. 

We are investigating the rational approximation as an additive preconditioner for the nonzero base 
flow case. Since local dynamics is dominated by viscosity, this approximation should correctly describe the 
colocated sensing/actuation singularity in the optimal feedback kernel. As a first step, we are investigating 
a shear flow where the Fourier transform in the streamwise direction can be used to simplify the problem. 
We intend to compare the performance and quality of numerical results in the preconditioned formulation 
with those obtained directly. The insight gained in this study will provide guidance for the development of 
numerical schemes for the full NACA 0015 wing case at moderate Reynolds number flows. 

The Coral Project 

The cost of developing complex computer components such as CPUs has become so high that scientific 
applications alone cannot carry the full burden. In the future, scientific computing will have to use mass 
market leverage to overcome the cost barrier. A cost-effective alternative to high-end supercomputing was 
pioneered by Beowulf, a cluster of commodity PCs. By now, high performance Beowulf clusters can be built 
using fast commodity PCs and switched Fast Ethernet. We want to explore the benefits and the limitations 
of this approach, based on applications of interest to ICASE. 

Based on the available performance and price data, we created a list of configurations and at each 
price level selected the dominant configuration. After a discussion of various application benchmarking 
requirements, a system consisting of 32 Pentium II 400 MHz nodes and a dual CPU server was selected. The 
system’s aggregate peak performance using multiple copies of the ATLAS benchmark exceeds 10 Gflop/s, 
while sustained performance on CFD applications is about 1.5 Gflop/s. Our benchmarks show perfect 
scaling with balanced coarse grained parallel codes. Fine grained codes show reasonably good scaling with 
the number of processors. During benchmarking, we discovered and resolved a performance limitation of the 
underlying TCP data transport protocol. Coral has an excellent price/performance ratio, almost an order of 
magnitude better than an equivalent supercomputer. This conclusion applies primarily to balanced coarse 
grained applications (e.g., domain decomposition codes). 

We expect to refine Coral’s performance through further benchmarking, and to use this system in solving 
some real problems. Since the cost of performance is rapidly decreasing, we hope to enhance and expand 
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this cluster in the future. 

The Coral Project was initiated by Piyush Mehrotra and Tom Crockett. Additional benchmarking was 
done by David Keyes and Brian Allan. 

DIMITRI J. MAVRIPLIS 

Large-scale Unstructured Mesh Computations Using a Parallel Multigrid Solver 

Unstructured mesh Navier-Stokes solvers offer great potential for reducing the turnaround time associ- 
ated with complex geometry aerodynamic analysis. For accurate computation of complicated aerodynamic 
flows, very high resolution grids are required. Furthermore, the large computational overheads associated 
with unstructured mesh methods require the use of efficient solution algorithms which can be ported to 
massively parallel architectures. The purpose of this work is to demonstrate the feasibility of performing 
very large scale unstructured mesh computations in a production setting using existing parallel machines. 

A low memory, rapidly converging unstructured multigrid algorithm has been developed and ported to 
parallel computer architectures. The fine and coarse levels of the unstructured multigrid algorithm are all 
partitioned sequentially before being distributed on the target parallel machines. Because the algorithm 
makes use of implicit line solves, the partitioning must be executed in such a way that the implicit lines 
of the various mesh levels are not intersected by processor boundaries. This is achieved by contracting 
the mesh graph along the implicit lines and partitioning the contracted (weighted) graph rather than the 
original graph of the mesh, which is then used to infer the final mesh partition upon de- contraction. The 
communication patterns (which remain static for the duration of the analysis) are then precomputed and 
stored. The implementation of the parallel solver is based on the MPI communication primitives. 

Good scalability of the unstructured mesh multigrid solver has been demonstrated on medium size 
problems involving several million grid points on both a CRAY T3E-600, using up to 512 processors, and an 
SGI Origin 2000, using up to 128 processors. A complete high- lift aircraft geometry case has been solved on 
a grid of 25 million points in 4.5 hours on 512 processors of the Cray T3E. The same case has also been run 
on 1450 processors of a CRAY T3E-1200E, which required just over one hour of compute time. 

The current solver has also been benchmarked on the ASCI Red and ASCI Blue-Pacific parallel com- 
puters, illustrating good scalability as well on these machines. 

Future work is concentrated on enabling the solution of even larger cases, up to 100 million grid points. 
This will require the parallelization of all preprocessing operations such as mesh partitioning and coarse 
multigrid level construction. This effort is viewed as the first step towards developing a practical large eddy 
simulation capability for aircraft configurations. 

CHI- WANG SHU 

High- order Discontinuous Galerhin Method and WENO Schemes 

Our motivation is to have high-order non- oscillatory methods for structured and unstructured mesh 
which are easy to implement for parallel machines. The objective is to develop and apply high-order dis- 
continuous Galerkin finite element methods and weighted ENO (WENO) schemes for convection dominated 
problems. The applications will be problems in aeroacoustics and other time- dependent problems with 
complicated solution structure. 

Jointly with Harold Atkins (NASA Langley), we are continuing in the investigation of developing the 
discontinuous Galerkin method to solve the convection- dominated convection diffusion equations. Emphasis 
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for this period is put upon studying the stability and accuracy issues involving both internal and domain 
boundary conditions. Discontinuous Galerkin method for 2D incompressible flow is also under development 
jointly with Jian-Guo Liu (University of Maryland). Jointly with Changqing Hu (Brown University), we 
have been pursuing adaptive methods using structured and unstructured high-order weighted ENO schemes. 
Preliminary results using a structured WENO code on the double Mach reflection problem indicate good 
resolution and a saving of 75% in terms of spatial mesh points over the uniform mesh code. 

Research will be continued for high-order discontinuous Galerkin methods and weighted ENO methods 
and their applications. 

DAVID SIDILKOVER 

Factorizable Schemes and Essentially Optimal Multigrid Solvers for the Flow Equations 

The main objective of this work is to develop discretization schemes that facilitate construction of the 
essentially optimal multigrid solvers for the equations of steady compressible flow. Our first target is the 
Euler equations in two dimensions. However, the methodology being developed is very general. It can be 
extended to Navier-Stokes equations and to three-dimensional problems. 

A factorizable high-resolution scheme for the compressible Euler equations has been constructed. The 
factorizability property is crucial for constructing essentially optimal multigrid solvers, since it makes it pos- 
sible to distinguish between the advection and full- potential factors of the system on the level of the discrete 
scheme. The key ingredient of such a solver is a relaxation procedure that relies on the auxiliary potential 
and stream-function variables and, therefore, utilizes the factorizability property. Another important impli- 
cation of the factorizability property is that the scheme should not lose accuracy for the low Mach number 
flow. The proposed approach also allows the combination of /i-ellipticity and high-resolution properties in 
one scheme. 

The current work is devoted to extending the scheme/ solver to general body-fitted grids. Extensions of 
the approach to viscous and three-dimensional problems are in progress as well. 

SEMYON TSYNKOV 

Artificial Boundary Conditions for Aerodynamic and Aeroacoustic Computations 

Many typical problems in aerodynamics and aeroacoustics, including those that present immediate prac- 
tical interest, e.g., flows around aircraft and problems of acoustic radiation/propagation/scattering, are 
formulated on infinite domains. It is, therefore, obvious that any numerical methodology for solving such 
problems has to be supplemented (or, rather, preceded) by some technique that would lead to a finite dis- 
cretization. Typically, the original domain is truncated prior to the actual discretization and numerical 
solution. Subsequently, one can construct a finite discretization on the new bounded computational domain 
using one of the standard techniques: finite differences, finite elements, or other. However, both the continu- 
ous problem on the truncated domain and its discrete counterpart will be subdefinite unless supplemented by 
the appropriate closing procedure at the external computational boundary. This is done by using artificial 
boundary conditions (ABC’s); the word “artificial” emphasizing here that these boundary conditions are 
necessitated by numerics and do not come from the original physical formulation. 

At the current stage of the aforementioned project, we are focusing on the following two research topics. 
First, we construct highly accurate global boundary conditions for the calculation of steady-state flows using 
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the new generation of advanced factorizable finite- difference schemes and fast muitigrid solvers. For the 
initial test cases the boundary conditions are obtained analytically via conformal mappings; at later stages 
we will employ the difference potentials method which has already demonstrated excellent performance in our 
previous work. The new schemes themselves have already led to a multifold reduction in the solution time 
(compared to the standard methods); when combined with the advanced external boundary conditions that 
allow for an order of magnitude decrease in the domain size without loss of accuracy, the new methodology 
may potentially result in more than two orders of magnitude overall reduction in the configuration analysis 
cycle. Second, we develop the exact ABC’s for time-dependent problems. The approach is based on exploiting 
the weak lacunae in numerical solutions of the wave- type equations. This allows effective restriction of the 
temporal nonlocality of the ABC’s, otherwise the procedure would be prohibitively expensive. We have 
studied the lacunae both analytically and experimentally and have already calculated the solutions to some 
model problems for the wave equation using the new ABC’s methodology; the results seem very promising. 
A series of conference and journal papers is in preparation on both foregoing subjects. 

Future research in the framework of this project will primarily concentrate on developing the unsteady 
ABC’s algorithms for problems in acoustics, including the advective case, and electromagnetics. 

This research was conducted in collaboration with V. Ryaben’kii, D. Sidilkover, S. Abarbanel, and J. 
Nordstrom (ICASE) and V. Vatsa, T. Roberts, C. Swanson, J. Thomas, and H. Atkins (NASA Langley). 
The project is supported by the Director’s Discretionary Fund. 
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PHYSICAL SCIENCES, FLUID MECHANICS 


RICHARD W. BARNWELL 

Hyperbolic Reynolds Stress Model for Turbulent Boundary Layers 

The boundary layer equations for incompressible mean flow with the turbulence model provided by the 
Reynolds stress equations are shown to be hyperbolic in the outer region where convection and diffusion 
dominate. Because diffusion is of inconsequential magnitude in the turbulent interior, it can be either 
ignored or approximated appropriately there so that the governing equations are hyperbolic across the entire 
turbulent part of the boundary layer. Consequently, hyperbolic solution techniques can be used to advantage 
to solve the turbulent boundary layer as Peter Bradshaw did over 30 years ago with a more approximate 
formulation. The hyperbolic solutions so obtained depend on conditions immediately upstream of the solution 
point and may give a better representation of the diverse behavior in complex three-dimensional boundary 
layers than traditional parabolic solutions. 

Closure assumptions are required to relate the diffusion terms, which are dominated by derivatives of 
time-averaged triple products of the fluctuating velocity components, to the Reynolds stresses. The tradi- 
tional approach is to replace the triple products with terms involving derivatives of the Reynolds stresses and 
solve the resulting parabolic problem. Experimental data show that the Reynolds stresses vary algebraically 
with distance from the mean boundary layer edge in the outer region where convection and diffusion dom- 
inate, and an asymptotic analysis shows that such functions satisfy a differential equation which renders 
the traditional differential representations of the triple products equivalent to the algebraic representation 
developed by Bradshaw. The result is a set of hyperbolic governing equations with fewer modeling constants 
than the corresponding parabolic set. In the hyper bolic approach the additional data are provided by initial 
conditions. The hyperbolic stress model is used to explain why the lateral spreading rate of a turbulent 
wedge in a laminar boundary layer is so much larger than the vertical boundary layer growth. 

The next task is to compare the results of this method to those of other methods and experimental data. 

SANG-HYON CHU 

Development of Microwave- driven Smart Material Actuator 

“Wireless” control of actuators with microwave offer tremendous advantages over hard-wired actuators, 
especially for space applications such as the Next Generation Space Telescope (NGST), in which thousands 
of discrete actuators are required to affect high precision distributed shape-control of the primary reflector. 
This new concept alleviates the need for hard-wired connections resulting in significantly simpler system 
designs and lower system mass. 

3x3 rectenna patches built at JPL were tested in an anechoic chamber by modulating microwave power 
level, frequency, incidence angle, and polarization angle. The PZT 5A multilayer piezoelectric actuator 
was selected as the smart actuator and tested under a direct coupling with a 3x3 rectenna. The obtained 
experimental results indicate that the multilayer piezoelectric actuator can be successfully utilized with a 
wide degree of controllability when the 3x3 patch rectenna converts microwave energy to DC power that, in 
turn, drives the actuator. 

The nature of dispersion of microwave might cause energy loss during transmission. The concept of 
power allocation and distribution will be considered for this reason. Logic circuits embedded in rcctennas 
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will control power collection and allocation to feed DC power to any actuator where optical correction is 
necessary. A prototype of a power distribution circuit will be fabricated and improved to meet all required 
characteristics in the future. 

AYODEJI DEMUREN 

Streamwise Vorticity Generation in Jets 

Experiments have shown that three-dimensional jets can be used to enhance mixing and entrainment 
rates in comparison to axisymmetric jets. A fundamental understanding of the dynamics of complex, tur- 
bulent jets is required for their prediction and control. Understanding of the evolution of the streamwise 
vorticity fields is essential. Experiments have used streamwise and azimuthal vorticity dynamics to explain 
the presence or absence of axis- switching in experimental measurements of 3:1 aspect ratio rectangular jets 
with different initial conditions. This study showed that the presence of streamwise vorticity pairs with out- 
flow rotation (pumping fluid from the core to the ambient perpendicular to the major axis plane) produced 
axis switching while pairs with the opposite sense of rotation did not. However, in jets with no streamwise 
vorticity at discharge, some other mechanism must originate it. 

Generation mechanisms are investigated via Reynolds-averaged (RANS), large-eddy (LES) and direct 
numerical (DNS) simulations of laminar and turbulent rectangular jets. Complex vortex interactions are 
found in DNS of laminar jets, but axis- switching is observed only when a single instability mode is present in 
the incoming mixing layer. With several modes present, the structure is not coherent and no axis- switching 
occurs. RANS computations also produce no axis-switching. On the other hand, LES of high Reynolds 
number turbulent jets produce axis- switching even for cases with several instability modes in the mixing 
layer. Analysis of the source terms of the mean streamwise vorticity equation through post-processing of the 
instantaneous results shows that a complex interaction of gradients of the normal and shear Reynolds stresses 
is responsible for the generation of streamwise vorticity which leads to axis-switching. RANS computations 
confirm these results. K — e turbulence model computations fail to reproduce the phenomenon, whereas 
algebraic Reynolds stress model (ASM) model computations in which the secondary normal and shear stresses 
are computed explicitly succeeded in reproducing the phenomenon accurately. 

More quantitative comparisons to experimental data are planned. 

SHARATH S. GIRIMAJI 

Pressure- strain Correlation Modeling: Testing and Validation 

At the second moment closure level, accurate modeling of turbulent flows is contingent upon accurate 
modeling of the pressure-strain correlation term. Development of pressure-strain correlation models valid 
for complex flows is the objective of this project. 

We have entered the final stages of validating and fine-tuning of the model. After successful validation in 
a variety of benchmark problems, more subtle issues on the manner of interpolation between extreme states 
are being addressed. While matched asymptotic expansion techniques are theoretically sound, they appear 
to lead to very complex model forms. Other avenues are being explored. 

Further testing and systematic development for intermediate states of turbulence will come next. 
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Rotating Turbulent Flows 

The effect of rotation on turbulence still remains an enigma in many practical flow situations. Our 
objective in to understand the behavior of irrotational fluctuations in a flow with strong mean-flow rotation. 

While the effect of rapid rotation on rotational fluctuations is well described by the Taylor-Proudman 
theorem, the behavior of irrotational fluctuations is not well known. We demonstrate that the Navier- Stokes 
equations permit a large family of irrotational solutions in two-dimensional or rapid rotation limits. This 
could lead to improved insight into the behavior of turbulence in rotating flows. 

The importance of irrotational fluctuations in turbulence needs to be further expounded. 

This research was conducted in collaboration with J.R. Ristorcelli (Los Alamos National Laboratory). 

Non- equilibrium Algebraic Reynolds Stress Modeling 

Computationally viable, yet physically accurate turbulence models are needed for large-scale, practical 
flow computations. We develop such a model starting from the physically sophisticated but computationally 
expensive second-order closure. 

The theory of complex dynamical systems is being studied to develop new reduction procedures. A 
scheme based on minimization of evolution potential was developed and is currently undergoing close scrutiny. 
This appears to offer important advantages over previous methods. A simple procedure for identifying the 
slow (master) variables is developed. 

Extensive testing of the slow variable selection criterion and the reduction procedure would come next. 

C.E. GROSCH 

Simulation of Supersonic Jet Mixing by Tabs in Lobe Ejectors 

Mixing enhancement of high- and low- speed streams is utilized as a means to improve efficiency of 
supersonic combustors, reduce aircraft signatures, and control high-speed jet noise. One common method of 
mixing enhancement is to use lobe mixer ejectors. Another is to place tabs on the edges of the jets. In the 
main, experimental studies are available to evaluate the performance and guide the design of these mixers. 
The objective of this research is to use numerical simulation to examine the performance of lobe ejectors, 
both with and without tabs, in order to understand the physics of the mixing and how it is affected by 
changes in the parameters of these devices. 

A set of numerical calculations are carried out using the compressible, three-dimensional, time- dependent 
Navier-Stokes equations. Tabs are modeled by pairs of counter rotating vortices. Various geometric configu- 
rations of the lobe mixers are simulated with periodic side boundary conditions to simulate an array of these 
devices. 

The simulations of the lobe mixer without tabs show that the jet becomes unstable and oscillates in 
the “garden hose” mode. For a particular lobe geometry and velocity ratio, the oscillation has a constant, 
narrow band, frequency near the inflow. Further downstream the amplitude grows and the motion becomes 
nonlinear leading to spectral broadening. Typical Strouhal numbers of the narrow band oscillation is about 
0.45. The physics of this phenomena is related generation of streamwise vorticity at the edges of the jet. As 
the disturbances become nonlinear, rapid mixing between the supersonic and subsonic jets occurs and, by 
about halfway down the channel, the jet and coflow become nearly fully mixed. A set of simulations of the 
same geometry with tabs has begun. The results of the first of these has been partially analyzed. 
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Future experimental and numerical studies are required to more clearly define the initial induced vorticity 
field in the round jet. It is hoped that future experiments will use PIV imaging to measure the cross-stream 
vectors. The experimental data could then be used to set the inflow conditions for the simulations. 

Further calculations are planned for the lobe mixer including varying the geometry and varying the 
placement of the tabs on the sides of the lobes. 

ROGER HART 

Flow Diagnostics Using Laser-induced. Thermal Acoustics 

The non-intrusive optical measurement of gas- phase parameters such as temperature, flow velocity, and 
pressure is of considerable utility in understanding the airflow around a test body in a wind t unne l Laser- 
induced thermal acoustics (LITA) is a relatively new optical diagnostic method that has great promise for 
becoming a practical, accurate flow characterization tool. Two laser pulses are employed in LITA. The 
first pulse creates a pair of counterpropagating acoustic wavepackets. The second pulse is diffracted by the 
wavepackets onto a detector. Analysis of the various features of the LITA waveform allows the determination 
of the speed of sound in the medium (and thus the bulk temperature), one or more components of the flow 
velocity, and the density or pressure. Advantages of LITA as compared to other, better-developed diagnostics 
are: LITA allows seedless velocimetry; LITA measurements take only about one microsecond, giving the 
potential for very high repetition rates for the study of turbulent flows; and LITA gives excellent ( 170) 

single-shot accuracy and precision. The goal of the current work is to completely understand the physics of 
the LITA measurement process and to embody that understanding in a quantitative model which has been 
carefully validated against laboratory experiments. 

The fundamental optical and acoustical mechanisms of LITA are well understood; nevertheless, com- 
bining these to create a model that can accurately and robustly duplicate the results of well- controlled 
experiments has involved considerable effort. On the experimental side, we currently make measurements in 
calibrated air flows using standard laboratory style lasers and optics, as this allows the greatest flexibility 
and control, though thought is being given to simplifying and hardening the equipment for use in production 
wind tunnels. Modeling currently combines fairly simple models for low-amplitude (linear regime) sound 
waves and standard optical diffraction theory. One recent accomplis hm ent was learning how to correctly 
include the effect of the finite size of the acoustic wavepackets on the decay rate of the LITA waveform. 
The decay of the signal limits the precision of measurements of temperature and velocity, and the rate of 
the decay is a critical piece of information for determining pressure, so this is of some importance for the 
application of LITA. 

The major unresolved modeling issue involves explaining certain systematic differences observed among 
the decay rates of the three spectral components that make up the LITA signal. An additional series of 
experiments is being considered to help constrain our modeling efforts. This research was conducted in 
collaboration with R.J. Balia and G.C. Herring (NASA Langley). 

LI-SHI LUO 

Lattice Boltzmann Scheme for Flow- structure Interaction 

One important problem in the applications of the lattice Boltzmann equation to various flow problems 
is the interaction between fluid flow and solid boundaries, i.e., the implementation of boundary conditions 
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in fluid- structure interfaces. The moving boundary problem in high Reynolds flow poses a challenge to 
traditional CFD methods. Usually, turbulence modeling has to be employed in such cases. The present 
work uses the method of the lattice Boltzmann equation (LBE) to simulate the flow- structure interaction 
problem. 

With the LBE method, boundary conditions for objects with complicated geometries are easy to imple- 
ment. We intend to implement a computationally efficient boundary condition for moving boundaries in flows 
with high Reynolds number. Various schemes combining existing bounce- back type boundary conditions with 
interpolation (or extrapolation) are under theoretical study and numerical test. 

A paper entitled “An Accurate Curved Boundary Treatment in the Lattice Boltzmann Method,” au- 
thored by Renwei Mei, Li-Shi Luo (ICASE), and Wei Shyy has been submitted to the Journal of Computa- 
tional Physics. Currently we are working on boundary conditions for a moving boundary. 

The present work has been funded by NASA Langley Research Center under the program of “Innovative 
Algorithms for Aerospace Engineering Analysis and Optimization.” The Co-PI’s of the proposal for the 
present work are Renwei Mei (UFL), Li-Shi Luo, and Wei Shyy (UFL). The collaboration also includes 
Pierre Lallemand (Director, ASCI-CNRS, Univ. Paris-Sud), and Dominique d’Humieres (ENS, Paris). 

Lattice Boltzmann Model for Non-ideal Gases 

The key issues in the study of multi-phase (e.g., liquid- vapor) flows are the modeling of interfaces and 
phase transition among different phases. It is difficult to use the Navier-Stokes equations to model the 
inhomogeneous multi-phase flows because the interfacial tracking is a laborious computation. In the past 
few years, a number of lattice Boltzmann models have been developed to model multi-phase flows. However, 
the multi-phase lattice Boltzmann equation is still lacking a rigorous theoretical basis. For instance, previous 
multi-phase lattice Boltzmann models do not have a consistent equilibrium thermodynamics. The present 
work applies the Enskog theory of hard spheres to revise the theory of the multi-phase lattice Boltzmann 
equation. 

With the Enskog theory we were be able to derive a new multi-phase lattice Boltzmann model which has 
a consistent equilibrium thermodynamics. We have rigorously demonstrated the deficiencies in the previous 
multi-phase lattice Boltzmann models and provided a systematic procedure to derive a correct multi-phase 
lattice Boltzmann model based upon the Enskog theory (or the revised Enskog theory). A brief account 
of the present work has been published in Physical Review Letters and as an ICASE report. An extended 
version of the work has been submitted to Physical Review E and a corresponding ICASE report is in 
preparation. 

We intend to derive a thermodynamically consistent multi-component lattice Boltzmann model in the 
future based upon the same methodology. 

ALEX POVITSKY 

Computation of Three-dimensional Acoustic Fields 

Out goal is to improve parallelization efficiency of sets of linear banded systems which represent a core 
part of implicit and compact solvers. To use processors for other tasks while they are idle from recursive 
algebraic computations, we run processors by a schedule rather than by communication. This schedule is 
generated before CFD computations are executed. 
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To improve parallelization efficiency, we combined our Immediate Backward Pipelined Gaussian Elimi- 
nation (IB-PTA) with the known Two-Way Pipelined Gaussian Elimination (TW-PTA) to obtain the Imme- 
diate Backward Two-Way Pipelined Thomas Algorithm (IBTW-PTA). To generate the processor schedule, 
we use recursive algorithm for the row of first P/2 processors as described in our ICASE Report and make 
symmetric reflection of this schedule for the last P/2 processors. Then we include exchange of the forward- 
step coefficients between the {Pf 2) th and (P/2+ l) th processors and solution of 2x2 system. These tasks are 
performed immediately after completing the forward-step computations for each group of lines on middle 
processors. Measurements on CRAY-T3E show an advantage of the proposed algorithm over the standard 
PTA, the TW-PTA and the IB-PTA for 8 and 16 processors-in-row. Reduction of processor idle time and 
large optimal size of the pocket of lines (low communication latency time) ensure low parallelization penalty 
of the proposed algorithm. 

We are working on implementation of this algorithm to a 3D aeroacoustic solver (with P. Morris); 
implementation of processor schedule for multigrid line solvers (with B. Diskin); for the front- type solvers 
where grid lines are data- dependent; and for the multi-zone solvers where a processor might handle pieces 
of different grids. 

C.-C. ROSSOW 

Investigation of the Properties of the MAPS Flux Splitting Scheme 

Several efforts have been focused on the development of discretization methods that combine the accu- 
racy of flux-difference splittings in capturing of shear layers with the robustness of flux vector splittings in 
capturing strong shock waves. One recent contribution to this class of hybrid flux splittings is the MAPS 
(Mach number based Advection Pressure Splitting) scheme. Significant features of the MAPS scheme are 
its simplicity, its robustness, and the fact that no entropy condition is required. Further research revealed 
that the scheme is very similar to the Roe flux- difference splitting, with the exception that no intermediate 
state needs to be computed. It was found that in the original MAPS formulation only the compressible 
terms of the Roe-scheme are retained. Including the incompressible terms of the Roe-scheme into the MAPS 
formulation extended MAPS to incompressible flows. In the research to be conducted, the connection of 
the MAPS discretization with the Roe-scheme shall be further exploited. On the one hand, a better un- 
derstanding of the terms necessary for low Mach number preconditioning is sought. On the other hand, 
research will be directed towards convergence acceleration by implicit methods. For implicit schemes, the 
flux Jacobians need to be evaluated, which is well established for the Roe-scheme. Due to the similarity 
of MAPS and Roe discretization, it is expected that simplifications to the implicit operators can be made. 
This may be essential for unstructured methods where the directional techniques from structured codes for 
implicit residual smoothing cannot be applied straightforwardly, but in 3D fully- implicit methods are still 
prohibitive due to storage requirements. 

The first area of research is the formulation of a consistent preconditioning matrix. An analysis of 
the Roe- scheme revealed that in the incompressible limit pressure terms dominate the artificial dissipation. 
These pressure terms are scaled by the inverse of the speed of sound. In order to remove the stiffness at 
incompressible flows, the speed of sound in these terms is artificially reduced, thus making these terms 
even more dominant. It appeared logical that these artificially increased pressure differences have to be 
balanced by properly scaled, artificially introduced time-derivatives of pressure. Adding these pressure 
time-derivatives to the equations written in strong conservation form leads to a preconditioning matrix 
being identical to the Choi/Merkle preconditioning in the incompressible limit. However, due to the proper 
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scaling, for compressible flows the non preconditioned compressible equations are recovered, a feature not 
shared by the original Choi/Merkle preconditioner. 

In the second area of research, the possibility of exploiting the MAPS formulation for an acceleration 
technique similar to implicit residual smoothing will be investigated. In the MAPS formulation, advective and 
pressure terms appear separately. Using an implicit, scalar smoothing for the advective terms in each equation 
while treating the pressure terms explicitly as a source term may result in an acceleration technique similar 
to the directional smoothing well established in structured methods. However, the directional dependence 
will be avoided, making the technique feasible for unstructured methods. Depending on the results obtained 
with this simplified implicit scheme, it may be intended to incorporate the MAPS discretization into a fully 
implicit formulation. 

ROBERT RUBINSTEIN 

Shock Wave Propagation in Weakly Ionized Gases 

It has been proposed that the mechanism responsible for anomalous properties of shock waves in weakly 
ionized gases could be identified by measuring the relaxation time of these properties following extinction 
of the plasma source, and matching it to the relaxation times of the nonequilibrium phenomena known to 
exist in weakly ionized gases. When these relaxation times cannot be measured directly, they are inferred 
theoretically, usually by assuming relaxation from a state nearly in thermal equilibrium. This proposal 
therefore requires the understanding of relaxation from a steady state far from thermal equilibrium. In order 
that the matching be unambiguous, the relaxation rates in the weakly ionized gas must be known precisely. 

We investigated this relaxation for two typical problems: the relaxation of a steady state described 
by a power-law distribution function, and the relaxation of a non-equilibrium steady state in a gas of 
light particles diffusing in a gas of heavy particles. In both examples, it is found that relaxation is much 
slower than relaxation from a near-equilibrium state. The explanation is that if the Boltzmann equation is 
satisfied away from the momentum space sources and sinks which maintain the non-equilibrium steady state, 
relaxation to thermal equilibrium requires that the effects of extinguishing the sources and sinks diffuse over 
all of momentum space. This relaxation can be very slow. We conclude that the relaxation times in a non- 
equilibrium weakly ionized gas may be evaluated incorrectly if exponential relaxation from a near- equilibrium 
state is assumed. A correct calculation will require a more detailed molecular model of the weakly ionized 
gas, at the level of a Boltzmann equation at least. 

The pressure fields produced in the regions of unbalanced charge ahead and behind the shock have been 
proposed as sources of increased sound speed and anomalous shock properties. The possibility of non- ideal 
gas corrections to the equation of state due to large electrostatic forces will be investigated next. 

This research was conducted in collaboration with A.H. Auslender (NASA Langley). 

Boundary Layer Receptivity in the Presence of Random Surface Roughness and Acoustic Excitation 

There is still no well-established procedure for incorporating transition in turbulence calculations. While 
aerodynamic flows can be computed successfully using any of several different turbulence models if the 
transition location is prescribed in advance, no single turbulence model can reliably predict transition. If 
transition is computed incorrectly, the entire flow calculation is generally unsatisfactory. 

As part of a larger program of integrating transition and turbulence models, the first stage of transition, 
boundary layer receptivity, is being considered from a probabilistic viewpoint. The analysis allows both 
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the surface roughness distribution and the acoustic excitation, which combine to excite Tollmien-Schlichting 
waves, to vary randomly. A simple stochastic differential equation is being investigated as a model of 
this process. Preliminary Monte Carlo simulations demonstrate the effect of random surface roughness in 
enhancing receptivity. 

The analysis will be extended to permit prediction of the probability density function of receptivity 
amplitude as a function of downstream position. The calculation will also be extended to more realistic 
models of receptivity, including downstream variability of the growth rate of Tollmien-Schhchting waves. 

This research was conducted in collaboration with S.S. Girimaji (ICASE) and C.L. Streett (NASA 
Langley). 

Theory of Rotating and Stratified Turbulence 

The theory of weak turbulence describes the inertial range structure of rotating turbulence, considered 
as a system of interacting inertial waves. It is natural to ask whether a similar description of the dissipation 
range is possible when wave effects persist into the dissipation range. This analysis is also motivated by 
the known anisotropy of energy transfer in rotating turbulence: unlike non-rotating turbulence, in which 
energy is transferred from larger to smaller scales of motion, in rotating turbulence, energy is simultaneously 
transferred to the plane perpendicular to the rotation axis. 

It is known that dissipation range interactions are between modes with nearly collinear wavevectors. It 
is shown that the dispersion relation of inertial waves permits such interactions to be resonant only when 
the wavevectors are nearly perpendicular to the rotation axis. Accordingly, the dissipation range in strongly 
rotating turbulence is concentrated near this wavevector plane. Since inertial range interactions transfer 
energy into this region, it is plausible that the dissipation range should be concentrated near it. 

This research was conducted in collaboration with Ye Zhou (ICASE and Tuskegee University). 

NAIL YAMALEEV 

A High-order Accurate Method on a Moving Grid Adapted to the Solution 

It is known that the attainment of high-order accuracy for problems with shocks is problematic, since a 
first-order error introduced by the shock-capturing procedure can persist globally downstream. One of the 
most effective ways to reduce this error is to diminish the grid spacing in the shock region alone rather than 
refine the grid in the entire computational domain. The main purpose of the present work is to elaborate 
a high-order accurate shock- capturing scheme on a moving grid dynamically adapted to the solution, that 
enables one to increase the resolution of high gradients as well as improve the accuracy of the solution in 
smooth flow regions. 

High-order linear and nonlinear shock- capturing schemes are used to solve the 2D unsteady Euler equa- 
tions written in general curvilinear coordinates. For the linear shock-capturing scheme, the interpolation 
set for the approximation of the solution is fixed as a function of grid location. For the nonlinear scheme, 
the solution is represented by using a high-order accurate polynomial reconstruction, so that the adaptive 
stencils employed in the high-order spatial operator are biased towards the smoothest information available. 
To generate a grid including such important properties as smoothness, orthogonality and adaptation simul- 
taneously, the variational approach proposed by Brackbill and Saltzman is employed. Since the Jacobian of 
transformation depends on the temporal coordinate, the geometric conservation law originally introduced by 
Thomas and Lombard must be satisfied. Then the geometric conservation law equation is solved numerically 
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along with the flow conservation law using the same conservative difference operators as those employed for 
approximating the governing equations- The high-order accurate flow solver and the adaptive grid generator 
have been implemented. We are currently joining these codes so that the geometric conservation law is 
satisfied automatically at each time step. 

We plan to apply the present method to calculate both steady and essentially unsteady flows with shocks. 

This research was conducted in collaboration with M. Carpenter and J. Thomas (NASA Langley). 

YE ZHOU 

On Higher-order Dynamics in Lattice-based Models Using Chapman- Enskog Method 

Compared to traditional methods in computational fluid dynamics (CFD), the lattice-based models are 
simple and easy to implement on computers. The advantages and disadvantages of the original lattice gas 
automata (LGA) have been well documented. The lattice Boltzmann equations (LBE) were later introduced 
to remove some of the drawbacks. A further simplification to the LBE is achieved using the BGK procedure 
(LBGK). Indeed, it is well established that the Navier-Stokes equation can be deduced at low-order expansion 
of Chapman-Enskog expansion. Many authors further asserted that the Burnett- like equation could be 
obtained by performing higher-order using Chapman-Enskog expansion. The motivation of this work is to 
carry out these higher-order Chapman-Enskog expansions to investigate whether it is consistent to do so. 

We found that two conditions determine whether the lattice-based models could or could not have 
higher-order dynamics when classical Chapman-Enskog expansion is used. These conditions are a number of 
conservation laws and the space and time discretization. The pure diffusion model, a system with only one 
conserved quantity, is first presented to illustrate that the higher-order dynamics is allowed. We then turned 
our attention to the lattice-based hydrodynamics equations. After noting the feature of non-commutative 
cross time derivative, we demonstrate how Burnett- like equations could be obtained for lattice- based hydro- 
dynamics models using the classic Chapman-Enskog expansion method. 

The results reported in this work can be used to analyze theoretically systems where hydrodynamic 
description may break down, a typical example is simulations of the micro-electronic mechanical systems 
(MEMS). 

This research was conducted in collaboration with Y.H. Qian (Columbia University). 
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COMPUTER SCIENCE 


PO-SHU CHEN 

Parallel Solution of Coupled Aeroelastic Problems 

The accurate prediction of aeroelastic response is essential in the design of high performance aircraft. 
It requires solving the coupled fluid and structure equations simultaneously. The objectives of this research 
are to investigate a variety of different approaches for solving aeroelastic problems, to establish a proper 
module between structure and fluid simulations, to solve the aeroelastic response, and to research a better 
integration algorithm for communication between fluid and structure equations. 

A new package, Load and Motion Transfer (LMT), has been developed to be a ‘bridge’ between CFD 
and FEM software for aeroelastic simulation. It is capable of interpolating the initial nodal coordinates of 
the fluid mesh from the structure nodal displacement, and to integrate the structure nodal force from the 
fluid pressure. It is superior to the FASIT code, currently being used by the MDO branch of NASA Langley, 
in terms of flexibility, accuracy, and user-friendliness. 

Since the reliable transfer program is available, the next stage is a simple static aeroelastic problem. 
The fluid research code developed by Dimitri Mavriplis and structure research code developed by Charbel 
Farhat have been selected for this purpose. However, the package here is capable of solving the steady state 
of the aeroelastic problems only. It cannot solve real time-dependent problems, like vibration. The second 
stage of improvements is to consider the proper approach for the heavy communication aeroelastic package. 
The details of the approach are still in discussion. 

This research was conducted in collaboration with Tom Zang and Anthony Giunta (NASA Langley), 
Dimitri Mavriplis (ICASE), and Charbel Farhat (University of Colorado). 

THOMAS W. CROCKETT 

Porting PGL to Beowulf- class PC Clusters 

The development of low-cost computational clusters based on commodity processors and networking 
components has become an important new trend in parallel computing. Many organizations have installed 
such systems, and many more are planning to do so. In the near future, Beowulf-class clusters could become 
the platform of choice for many challenging scientific and engineering computations. To derive maximum 
benefit from these systems, users will need the same tools and capabilities that have been developed for use 
on proprietary parallel computing systems. 

One of these tools is the PGL rendering system, developed at ICASE to provide runtime visualiza- 
tion support for parallel applications. PGL currently runs on half a dozen different MPP systems. We 
have recently been working to develop a version for Linux-based PC clusters, using ICASE’s Coral system, 
a Beowulf-class cluster, as a development platform. Although we expected this to be straightforward, a 
number of problems have arisen involving compilers and low-level communication layers. Consequently, a 
substantial portion of our effort has involved installing and testing compilers, message passing libraries, net- 
work interfaces, and job schedulers. Serial and parallel versions of PGL are now running on the Coral cluster, 
using 400 MHz Pentium II processors and Fast Ethernet communication hardware. Serial performance on 
a benchmark suite is good, ranging from 70-107% of a 300 MHz Sun UltraSPARC II and 39-71% of a 250 
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MHz MIPS R10000. Parallel performance results are awaiting resolution of problems with Coral’s network 
interfaces. 

When testing and performance evaluation of the Linux/ PC version of PGL is completed on Coral, we 
plan to release it to NASA’s HPCCP/CAS community as part of PGL 1.2. Longer term plans include 
additional testing and algorithmic modifications for distributed shared memory architectures such as the 
SGI 0rigin2000 and HP Exemplar, where scalability has so far been poor. We also have plans to incorporate 
additional functionality in PGL, and to develop improved user interfaces for interactive applications. 

Application of Parallel and Distributed Computing to Visualization and Data Assimilation Problems in the 
Atmospheric Sciences 

To implement the Vice President’s vision of a Digital Earth, vast quantities of data from disparate 
sources must be integrated into an intuitive, accessible representation. NASA’s Earth Science Enterprise 
sees Digital Earth as a promising framework for making much of its remote sensing data available to the 
scientific community and the general public. To implement the Digital Earth concept, many technologies 
will need to be brought to bear, among them visualization, networking, and high-performance computing. 

We are exploring the potential for parallel and distributed computing and visualization techniques to 
contribute to the data processing and data assimilation requirements of Digital Earth. We have used ICASE’s 
PGL rendering system to develop a prototype visualization application which combines a medium-resolution 
(9 km) elevation model of the Earth with a true-color surface map, including support for several different 
map projections. Preliminary performance tests have been conducted on Langley’s 16-processor 0rigin2000 
system, and on a network of Sun UltraSPARC workstations at ICASE. Although rendering performance is 
good, the results suggest that multi-resolution data representations and additional graphics functionality 
(such as triangle strips and more aggressive clipping algorithms), in addition to higher processor counts, are 
needed to deliver interactive performance with models of this size (18.7 million triangles). User interfaces 
which are tailored to the application will also be required, and we have begun evaluating Java for this purpose. 
In related activities, we served on Langley’s Digital Earth Planning Team, and continued participating in 
meetings of the federal Inter-agency Digital Earth Working Group. 

We plan to combine atmospheric data from Langley’s LITE experiment with the digital terrain model 
described above to produce an interactive tool for visualizing vertical structure in the atmosphere. The 
ultimate goal is to develop a responsive, user-friendly system which will combine atmospheric data from 
a variety of sources to obtain a better understanding of the physical processes involved. We also want to 
investigate approaches for incorporating much larger terrain models, such as USGS’s 30-arc-second global 
elevation dataset (933 million grid points). We hope that the techniques developed will lead us toward 
Digital Earth’s goal of providing interactive access to multi- petabyte datasets, a challenge which is beyond 
the capability of current computing technology. 

DAVID E. KEYES 

Parallel Implicit Solvers for Simulation of Multiscale Phenomena 

The development and application of parallel implicit solvers for multiscale phenomena governed by PDEs 
are our chief objectives. Newton- Krylov- Schwarz (NKS) methods have proven to be broadly applicable, 
architecturally versatile, and tunable for high performance on today’s high-end commercial parallel platforms 
(e.g., Cray T3E, SGI Origin, IBM SP). Both structured-grid and unstructured- grid CFD legacy codes have 


19 



been ported to such platforms and reasonable objectives for algorithmic convergence rate, parallel efficiency, 
and raw floating point performance have been met. However, architectural challenges have increased on 
the next generation of high-end machines, as represented, for instance, by the ASCI “blue” machines at 
Lawrence Livermore and Los Alamos National Laboratories, and also on Beowulf clusters, such as ICASE’s 
Coral. Our primary efforts are concentrated on algorithmic adaptations of NKS methodology appropriate 
for the emerging architectures and on evaluation of new software tools and methodology to get the most 
performance out of them. 

The general approach embodied in the NKS family of algorithms is documented in previous ICASE 
technical reports, among other places. Specific emphases in the most recent reporting period include en- 
hanced per- node floating point performance, multilevel preconditioning, optimization, and evaluation of NKS 
applications on. the ICASE Beowulf system. 

Per-node floating point performance has been a source of major consternation for users (and apologists) 
of high-end machines. Anecdotal evidence, such as a list of recent “Bell Prize” peak performance winners, 
indicates that sparse, grid- based computations do not stack up very competitively against other scientific 
simulations. We have shown that attention to cache line reuse in the organization and ordering of grid- based 
data that is iteratively dragged up and down the memory system in a typical PDE code can make an order of 
magnitude difference in execution time, apart from parallelism, and an experimental program to study this 
effect via hardware event counters is on-going. Our ultimate aims are to apply formal optimization techniques 
to the layout of program data for optimal register and cache residency, to prepare for “Processors- in- Memory” 
(PIM) programming that vendors have announced in future products, and to evaluate the algorithmic utility 
of multivector forms of sparse algorithms with better cached matrix reuse. 

Single-level Schwarz preconditioning is sufficient for many purposes, especially unsteady or pseudo-time 
continuation applications. However, we have recently demonstrated on some highly nonlinear radiation 
transport applications that 2-level Schwarz methods, with a coarse level that is removed from the fine grid 
by many powers of two in density, is not only superior in convergence but can be somewhat superior in 
overall execution time, in spite of the global coordination required. 

Optimization is usually the real goal of computational simulation capability. As a goal unto itself, parallel 
optimization is much studied, but optimization subject to a high- dimensional set of equality constraints 
coming from a discretized PDE is a situation in which the tail wags the dog. Following the leads of O. Ghattas 
and D.P. Young in this area, we are exploring the utility of the NKS “rootfinder” as a Lagrange-NKS 
optimizer. 

In terms of peak performance, the ICASE Beowulf cluster is cost-effective hardware, but the software 
environment is co- critical. In tests of the same Euler benchmark used on the ASCI machines, we have 
shown (see the Coral webpages) that the Portland Group compilers are particularly effective on native 
non-cache-optimized code, with uniprocessor running times that beat the ASCI processors and also the 
same 400 MHz Pentium II with NT compilers. For cache-optimized code, the R10000 and Power2 are still 
somewhat superior, but the per-node performance of Coral is almost competitive, independent of economic 
considerations. 

We will continue to develop NKS methods in implicit parallel CFD, examining a variety of algorithmic, 
programming paradigm, and architectural issues. We will also increase the complexity of the models in our 
NKS radiation transport work, in accordance with the ASCI project roadmap. 

This research was conducted in collaboration with W. Kyle Anderson (NASA Langley), Dana Knoll (Los 
Alamos National Laboratory), Dinesh Kaushik, Nilan Karunaratne, and Xin He (Old Dominion University), 
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and Satish Balay, William D. Gropp, Lois C. Mclnnes, and Barry F. Smith (Argonne National Laboratory). 

GERALD LUTTGEN 

Statecharts via Process Algebra 

Statecharts is a visual language for specifying synchronous reactive systems which is popular among 
software engineers, despite the complexity of its step semantics. It extends finite-state machines by concepts 
of concurrency, hierarchy, and priority. Most Statecharts variants do not have a compositional semantics 
and, thereby, prohibit the reuse of specifications of systems’ components. The reason for this prohibition 
is the subtle interplay between micro and macro steps, as imposed by Statecharts’ synchrony hypothesis 
and the principle of causality. The focus of this research is to develop a compositional process-algebraic 
framework which is expressive enough to embed several Statecharts variants. 

The process algebra that has been developed is inspired by timed process languages and unifies the 
principles of Statecharts semantics, such as concurrency, causality, and synchrony. It represents macro steps 
as sequences of micro steps which are enclosed by clock ticks. The benefits of the this approach include the 
establishment of a compositional framework (1) which is suitable for embedding several Statecharts variants, 
(2) which is intuitive and simple since causal orderings are not encoded in transition labels, (3) which can be 
equipped with behavioral equivalences carried over from traditional process algebras, and (4) which allows 
for interfacing Statecharts to verification tools. 

In the future, we hope to apply the insights between clock semantics and Statecharts semantics obtained 
during this research to develop a Statecharts variant which is suitable for specifying distributed reactive 
systems. 

This research was conducted in collaboration with Ranee Cleaveland (SUNY at Stony Brook) and Michael 
von der Beeck (TU Munich). 

Applying Model Checking Tools to the Verification of Flight Guidance Systems 

Mode confusion is one of the most serious problems in aviation safety. Today’s digital flight decks are too 
complex in order for pilots to be aware of the actual states - or modes -- of all systems. A year ago, NASA 
Langley started an initiative to analyze the mode logic of a flight- guidance system to uncover weaknesses in 
its design which may lead to mode confusion. For this purpose, the mode logic was modeled as a finite state 
machine, and the theorem prover PVS was used to reason about the system. The objective of this research is 
to investigate whether model checking techniques - i.e., sophisticated, automated state-exploration methods 
- are able to achieve this task “better” than theorem proving. 

In this light, the mode logic is modeled and analyzed by using three popular model- checking tools: Mur 0, 
SMV, and Spin. In general, all three tools are able to handle the task fairly well and promise to scale up. 
The modeling is most elegant in Mur0 and SMV since their specification languages match the characteristics 
of the mode logic as a modular, synchronous system. Murk’s rich language even allows for carrying over the 
PVS specification of the mode logic one-to-one, and its ability to specify and verify invariants enables the 
efficient verification of many properties related to mode confusion. For the latter, however, the temporal 
logic CTL - as employed in SMV is more practical due to its flexibility to reason about system paths rather 
than system states. Moreover, SMV’s model checker, which is based on Binary Decision Diagrams, is faster 
than the other tools and outperforms PVS by returning verification results instantly. Finally, diagnostic 
information generated by each of the three tools is as adequate as the information obtained when using PVS. 
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In the future, we will model larger parts of the digital flight deck, such that the model-checking techniques 
may investigate more complex problems related to mode confusion. 

This research was conducted in collaboration with Victor Carreno (NASA Langley). 

KWAN-LIU MA 

Image Graphs A Novel Approach to Visual Data Exploration 

Effort spent generating and collecting data is wasted unless there are effective means to organize and 
understand this data. This fact poses a problem in some modern visualization research. For example, in 
volume rendering the current data handling and visualization technology cannot handle the sheer size of 
emerging datasets. While various efforts have been made to condense datasets and accelerate rendering 
calculations, little work has been done to coherently represent the process and results of this type of visu- 
alization. However, this information about the data exploration is knowledge that should be shared and 
reused. The objective of this research is to develop a mechanism which not only offers a representation of 
this knowledge but also serves as an interface for visual data exploration. 

We use a graph-based approach to represent not only the results but also the process of data visual- 
ization. Each node in the graph consists of an image and the corresponding visualization parameters used 
to produce it. Each edge in the graph shows the change in rendering parameters between the two nodes it 
connects. We, thus, call this design image graphs . Image graphs are not just static representations since 
users can interact with a graph to review a previous visualization session or to perform new rendering. In 
particular, operations which cause changes in rendering parameters can propagate through the graph. Image 
graphs help streamline the process of visual data exploration in two ways. First, the graphs give the user 
a representation of the relationship between the visualization parameter changes and the images produced 
using them. Often these relationships are not obvious just through inspection of the rendered images. An 
understanding of how specific rendering parameter changes will affect the image output is important because 
it reduces the number of images the user must produce to find parameters which yield a useful image, and 
these images can be quite time consuming to produce. Second, the dynamic features of the graphs, such as 
annotation and automatic pruning, facilitate collaboration and animation. They also help speed the search 
for good rendering parameters by allowing users to perform operations on groups of nodes. These opera- 
tions include simple modification of rendering parameters, combination of nodes to form “child” nodes with 
their properties, and propagation of modifications through the graph. We have implemented a web-based 
volume visualization system which uses the image graph design for the purpose of supporting remote and 
collaborative visualization. 

We are presently designing a comprehensive user study to understand the extent to which the image 
graphs can be shared and reused, and to refine the design of the visualization system we have built and its 
image graph interface. Furthermore, we think image graphs would be useful for any type of data exploration 
problem which produces images of data as a function of some set of parameters. Therefore, in addition 
to volume visualization, other possible applications include radiosity calculations, 2D image filtering, and 
polygon-based rendering. Our future work includes demonstrating that our approach is indeed useful for 
these other problem domains. 
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PIYUSH MEHROTRA 


Arcade: A Distributed Computing Environment for ICASE 

Distributed heterogeneous computing is being increasingly applied to a variety of large-size computa- 
tional problems. Such computations, for example, the multidisciplinary design optimization of an aircraft, 
generally consists of multiple heterogeneous modules interacting with each other to solve the problem at 
hand. Such applications are generally developed by a team in which each discipline is the responsibility of 
experts in the field. The objective of this project is to develop a GUI-based environment which supports the 
multi-user design of such applications and their execution and monitoring in a heterogeneous environment 
consisting of a network of workstations, specialized machines, and parallel architectures. 

We have been implementing a Java-based three-tier prototype system which supports a thin client 
interface for the design and execution of multi-module codes. The middle tier consists of logic to process the 
user input and also to manage the resource controllers which comprise the third tier. In the last few months 
we have focused on the issue of resource discovery and monitoring. In particular, we have implemented 
an add-on module to manage the resources based on the JINI technology developed by Sun for resource 
management. JINI allows independent resources to announce their presence and current status to a central 
server. This module provides a client interface which allows the user to monitor the current status of the 
resources. One of the issues with JINI is that it uses the multicast protocol for its discovery and join 
processes. Such protocols do not work over subnets or across domains. We have designed a hierarchical 
implementation of servers which allows the resources to announce their presence across the whole Arcade 
environment even if it spans multiple domains. Similarly, it allows the resource allocation module to query 
the status of resources across the whole environment. 

We continue to develop the system adding other features such as support for conducting parameter 
studies. We also intend to expand the kind of modules that can be used by Arcade, in particular providing 
support for CORBA-based components. 

This research was conducted in collaboration with K. Maly, A. Al-Theneyan, and M. Zubair (Old 
Dominion University). 

Languages for High Performance and Distributed Computing 

There are many approaches to exploiting the power of parallel and distributed computers. Under this 
project, our focus is to evaluate these different approaches, proposing extensions and new compilation tech- 
niques where appropriate. 

Recently a proposal was put forth for a set of language extensions to Fortran and C based upon a 
fork-join model of parallel execution; called OpenMP, it aims to provide a portable shared memory program- 
ming interface for shared memory and low latency systems. However, these extensions ignore the issue of 
data locality which becomes a performance issue on shared address space machines which use a physically 
distributed memory system. We have proposed a set of OpenMP extensions to allow users to express the 
distribution of the data structures in a manner similar to the one used in HPF. We are currently in the 
process of implementing these extensions in order to study their efficacy. 

We have also continued our study on the applicability of HPF to a series of codes using semi-structured 
grids ranging from multiblock, semi-coarsening multigrid, and structured AMR algorithms. We have exam- 
ined a range of data distribution strategies for these algorithms and have tried to characterize the situations 
under which each of these strategies would produce the best results. 
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OPUS, a language jointly developed by ICASE and University of Vienna, provides high-level support 
for programming multimodule applications. In the last few months we have redesigned and reimplemented 
the Opus runtime system. We have also completed the compiler front-end necessary for translating Opus 
programs to target the runtime system. This translator has been implemented using the Vienna Fortran 
Compiler System of the University of Vienna. The system allows users to translate and execute Opus 
programs across a network of workstations. We are in the process of evaluating our design and enhancing it 
to incorporate support for distributed processing within Opus modules. 

This research was conducted in collaboration with B. Chapman (University of Houston), Erwin Laure 
(University of Vienna), and H. Zima (University of Vienna). 

ALEX POTHEN 

Parallel Algorithms for Incomplete Factorization Preconditioners 

The parallel computation of incomplete factorization (ILU) preconditioners for solving large systems of 
equations has, until recently, remained an elusive goal. We propose to develop new algorithmic approaches 
that avoid the serial bottlenecks that have plagued existing algorithms, to implement these algorithms, and 
to identify applications where these preconditioners are effective. 

The new algorithm is based on a characterization of the fill (zero elements in the coefficient matrix 
becoming nonzero during the factorization) in terms of paths in the adjacency graph associated with the 
coefficient matrix. We assume that the adjacency graph can be partitioned into subgraphs of roughly equal 
sizes such that few edges are cut by the partition. We map the subgraphs to processors, form a subdomain 
interconnection graph, and order the subdomains so as to reduce global dependences. On each subdomain, 
we locally reorder the interior vertices before the boundary vertices. This reordering limits the fill that joins 
a subgraph on one processor to a subgraph on another, and enhances the concurrency in the computation. 
The preconditioner computation takes places in two phases: in the first phase, each processor computes the 
rows of the preconditioner corresponding to the interior vertices of their subdomains. In the second phase, 
the rows corresponding to the boundary nodes are computed. 

Our preliminary results on the SGI Origin show efficiencies greater than 75% on up to 16 processors. 
We are continuing to develop our parallel implementation, and are incorporating new algorithms that we 
have designed for efficient serial computation of preconditioners. 

This research was conducted in collaboration with David Hysom (Old Dominion University and ICASE). 

Spindle: An Algorithmic Laboratory for Ordering Algorithms 

We have begun to work on an algorithmic laboratory for quickly prototyping promising algorithms and 
experimenting with a collection of algorithmic variants for several ordering problems. Among these are the 
fill reduction problem: Order the rows and columns of the coefficient matrix to reduce the fill in sparse 
Gaussian elimination (both complete and incomplete factorizations); and the sequencing problem: Given 
a set of elements, and pairs of elements that are related, order the elements such that related elements 
are numbered consecutively. We employ object-oriented design techniques (OOD) to make the laboratory 
flexible and easy to extend. 

OOD manages complexity by means of decomposition and abstraction. We decompose our software 
into two main types of objects: structural objects corresponding to data structures, and algorithmic objects 
corresponding to algorithms. This design decouples data structures from algorithms, permitting a user to 
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experiment with different algorithms and different data structures, and if necessary develop new algorithms 
and data structures. We have implemented seven variants from the family of minimum degree ordering 
algorithms using this design paradigm. Some of these algorithms were developed only in the past few years, 
and prior to our work, there was no single code that implemented all of these algorithms. Our implementation 
makes it possible for us to change ordering algorithms midstream while ordering a problem. We have found 
this to be of benefit, since a hybrid algorithm that employs the multiple minimum degree (MMD) algorithm 
and switches at later stages to the approximate minimum degree (AMD) algorithm can improve performance 
for problems where either algorithm has poor performance. These ordering algorithms are quite sophisticated, 
and their performance on various problem classes is poorly understood. Our algorithmic laboratory enhances 
our understanding of these issues since encapsulation makes it possible to examine the state of the objects 
in our code during execution. 

We have also implemented wavefront- reducing algorithms — such as the Cuthill-McKee and Sloan order- 
ing algorithms — in our library. Spindle, our code, is available as a stand-alone program and with an interface 
to Matlab. 

This research was conducted in collaboration with Gary Kumfert (Old Dominion University and ICASE). 

KEVIN P. ROE 

Parallelization of a Multigrid Incompressible Viscous Cavity Flow Solver Using OpenMP 

Effective use of parallel machines requires easily maintainable and portable programming models that 
allow users to exploit parallelism in applications written in a standard high-level language. MPI provides 
portability, however it can be more difficult to maintain and is not a high-level programming model. High 
Performance Fortran (HPF) is portable and fairly easy to maintain. OpenMP is also portable on shared 
memory architectures and fairly easy to maintain, although it can only be used on shared memory machines. 
OpenMP has some advantages over HPF and MPI when one is using a shared memory machine. Such 
as allowing the user to incrementally parallelize their code. Another benefit is that when the number of 
processors is changed that data residing in memory does not have to be reshaped. 

To evaluate OpenMP’s capabilities, we examine a two-dimensional multigrid incompressible viscous 
flow solver. This solver, originally written to be run sequentially, only required one major change. The 
Symmetric Gauss Seidel (SGS) algorithm that was originally used had to be replaced because its red-black 
parallel version was numerically unstable. Since we were more interested in testing OpenMP’ s capabilities, 
a simple parallel Jacobi algorithm was substituted in its place. Results of the code’s parallelization using 
OpenMP on the SGI 0rigin2000 at NASA Ames were promising. Parallel efficiencies were in the 90-100% 
range for four processors on a problem size of 512x512. Tests using a different number of processors for each 
grid level at runtime were also conducted. We were able to reduce the overhead associated with using too 
many processors on a small problem size by specifying the number of threads (and hence processors) for 
each grid level at runtime. 

We are still investigating where the loss in efficiency is occurring; we believe that larger problem sizes 
will yield better parallel efficiencies when more processors are utilized. We will also examine a mechanism 
for determi nin g the ideal number of processors to utilize for each grid level at runtime. 

This research was conducted in collaboration with Piyush Mehrotra (ICASE). 
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LINDA STALS 


Solution Techniques for Radiation Transport Equations 

When modeling radiation transport, a system of three nonlinear time-dependent equations is often used. 
Due to the behavior of the nonlinearities, this system is computationally expensive to solve. We are studying 
the use of two different approaches to reduce the solution time, namely, the use of better solution techniques, 
such as multigrid methods, and the use of parallel machines. 

As a preliminary study of the radiation transport equations, we have considered the special case where 
all energies are in equilibrium. In such a case, the system of equations can be reduced to a single equation. 
This single equation is interesting in its own right as it contains strong nonlinearities and large jumps in 
the coefficients. The results and lessons learned in the study of this single equation will be used when we 
implement the system of three equations. 

The discretization technique we used was the finite element method with piecewise linear basis elements. 
We are currently comparing our results with those obtained by other groups, which use different discretization 
techniques, to ensure that the finite element method is ‘capturing’ the right information. 

We also compared the use of Newton’s method with the FAS (nonlinear multigrid) scheme. We found 
that when the jumps in the coefficients were not too large both methods performed well. However, when the 
size of the jumps was increased we needed to modify our algorithms. In particular, for the FAS scheme to 
work properly, the equation on the coarsest grid had to be solved to a high degree of accuracy. For Newton’s 
method, automatically calculating the step size greatly reduced the number of iterations. Furthermore, the 
use of adaptive refinement helped the solution process as the approximation calculated on the coarser grids 
gave a good initial guess to the solution on the current grid. As the system of three equations also contains 
large jumps in the coefficients, we believe that the techniques and methods which we have shown to work 
here will be a good starting point when we try to solve the system. 

We ran the code on a network of workstations and verified that we get the same mathematical results 
as though it were run in parallel. However, we do not have any parallel efficiency results yet. One of our 
next goals is to test the parallel efficiency of our approach. 

The form of the nonlinear term in radiation transport equations can vary. So far we have only considered 
the weakest or least nonlinear form. We would also like to rerun our experiments using the other forms of 
the equations. 

This research was conducted in collaboration with David Keyes and Alex Pothen (Old Dominion Uni- 
versity and ICASE) and Dimitri Mavriplis (ICASE) as part of an ASCI project. 

HANS ZIMA 

Feedback- directed and Adaptive Compilation 

Traditionally, compilation has been seen as a batch process, in which a high-level language is translated 
into a machine or assembly language executable on a given target machine. Compilation is performed 
in a given machine/ system environment known to the compiler, which can be exploited for optimizing 
the target program. If the environment is not known at compilation time, or if it may change during 
execution, the target program has to be parameterized accordingly. The late binding associated with such 
a parameterization guarantees flexibility on the one hand, but on the other hand may result in less efficient 
code if compared to an early binding approach. The objective of this study is to examine the changing role 
of the compiler in modern computing environments and its interrelationship with performance tools. 
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The traditional view of compilation can no longer be maintained, for reasons due to the evolution of 
computing systems, languages, and compiling techniques. For example, in a heterogeneous environment 
(which may encompass the whole Internet), a client may send a source program (or a partially translated 
intermediate version of the source) to a remote server for compilation and execution. Similarly, in contrast to 
traditional static compilation, the Java HotSpot virtual machine identifies bottlenecks during interpretation 
of a Java program, and optimizes execution by performing on-the-fly compilation to native code. The 
inspector/ executor approach, which is being routinely used for the runtime optimization of parallel loops 
in high-level languages, is an example for runtime compilation using feedback based on information gained 
during execution. Systems such as ATLAS and FFTW use performance feedback to optimize the code for a 
given environment. A number of programming systems (such as the AURORA Compilation Environment) 
use performance feedback from execution traces for performance tuning in the compile/ execute cycle. 

We are currently developing a taxonomy of the existing approaches in this field. Following this, we will 
study the possibility of extending the Vienna Fortran Compilation system and related performance tools to 
demonstrate proof-of-concept solutions for relevant application problems. 

This research was conducted in collaboration with Piyush Mehrotra (ICASE). 
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REPORTS AND ABSTRACTS 


Sidilkover, David: A new time-space accurate scheme for hyperbolic problems I: Quasi- explicit case . ICASE 
Report No. 98-25 , (NASA/CR-1998-208436), October 27, 1998, 24 pages. Submitted to Communications in 
Applied Analysis. 

This paper presents a new discretization scheme for hyperbolic systems of conservations laws. It satisfies 
the TVD property and relies on the new high-resolution mechanism which is compatible with the genuinely 
multidimensional approach proposed recently. This work can be regarded as a first step towards extending 
the genuinely multidimensional approach to unsteady problems. Discontinuity capturing capabilities and 
accuracy of the scheme are verified by a set of numerical tests. 

Mavriplis, Dimitri J.: On convergence acceleration techniques for unstructured meshes. ICASE Report No. 
98-44 . (N AS A/CR- 1998-208732), November 2, 1998, 35 pages. ~ ~ 

A discussion of convergence acceleration techniques as they relate to computational fluid dynamics prob- 
lems on unstructured meshes is given. Rather than providing a detailed description of particular methods, 
the various different building blocks of current solution techniques are discussed and examples of solution 
strategies using one or several of these ideas are given. Issues relating to unstructured grid CFD problems 
are given additional consideration, including suitability of algorithms to current hardware trends, mem- 
ory and cpu tradeoffs, treatment of nonlinearities, and the development of efficient strategies for handling 
anisotropy- induced stiffness. The outlook for future potential improvements is also discussed. 

Povitsky, A.: Parallel directionally split solver based on reformulation of pipelined Thomas algorithm. 

ICASE Report No. 98-45 , (NAS A/CR- 1998-208733), October 27, 1998, 30 pages. To be submitted to SIAM 
Journal of Scientific Computing. 

In this research an efficient parallel algorithm for 3-D directionally split problems is developed. The 
proposed algorithm is based on a reformulated version of the pipelined Thomas algorithm that starts the 
backward step computations immediately after the completion of the forward step computations for the first 
portion of lines. This algorithm has data available for other computational tasks while processors are idle 
from the Thomas algorithm. 

The proposed 3-D directionally split solver is based on the static scheduling of processors where local and 
non-local, data-dependent and data- independent computations are scheduled while processors are idle. A 
theoretical model of parallelization efficiency is used to define optimal parameters of the algorithm, to show 
an asymptotic parallelization penalty and to obtain an optimal cover of a global domain with subdomains. 

It is shown by computational experiments and by the theoretical model that the proposed algorithm 
reduces the parallelization penalty about two times over the basic algorithm for the range of the number of 
processors (subdomains) considered and the number of grid nodes per subdomain. 
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Chow, P.L., and L. Maestrello: Vibrational control of a nonlinear elastic panel. ICASE Report No. 98-46 , 
(NASA/CR-1998-208734), November 5, 1998, 19 pages. To be submitted to the Journal of the Acoustic 
Society of America. 

The paper is concerned with the stabilization of the nonlinear panel oscillation by an active control. The 
control is actuated by a combination of additive and parametric vibrational forces. A general method of 
vibrational control is presented for stabilizing panel vibration satisfying a nonlinear beam equation. To obtain 
analytical results, a perturbation technique is used in the case of weak nonlinearity. Possible application to 
the other type of problems is briefly discussed. 

Booker, Andrew J., J.E. Dennis, Jr., Paul D. Frank, David B. Serafim, Virginia Torczon, and Michael W. 
Trosset: A rigorous framework for optimization of expensive functions by surrogates. ICASE Report No. 
98-47 , (NASA/CR- 1998-208735), November 5, 1998, 24 pages. To appear in Structural Optimization. 

The goal of the research reported here is to develop rigorous optimization algorithms to apply to some 
engineering design problems for which design application of traditional optimization approaches is not prac- 
tical. This paper presents and analyzes a framework for generating a sequence of approximations to the 
objective function and managing the use of these approximations as surrogates for optimization. The result 
is to obtain convergence to a minimizer of an expensive objective function subject to simple constraints. The 
approach is widely applicable because it does not require, or even explicitly approximate, derivatives of the 
objective. Numerical results are presented for a 31-variable helicopter rotor blade design example and for a 
standard optimization test example. 

Povitsky, A.: Parallelization of the pipelined Thomas algorithm. ICASE Report No. 98-48 , (NASA/CR- 
1998-208736), December 3, 1998, 26 pages. Submitted to the Journal of Parallel and Distributed Computing. 

In this study the following questions are addressed. Is it possible to improve the parallelization efficiency 
of the Thomas algorithm? How should the Thomas algorithm be formulated in order to get solved lines that 
are used as data for other computational tasks while processors are idle? 

To answer these questions, two-step pipelined algorithms (PAs) are introduced formally. It is shown 
that the idle processor time is invariant with respect to the order of backward and forward steps in PAs 
starting from one outermost processor. The advantage of PAs starting from two outermost processors is 
small. Versions of the pipelined Thomas algorithms considered here fall into the category of PAs. 

These results show that the parallelization efficiency of the Thomas algorithm cannot be improved di- 
rectly. However, the processor idle time can be used if some data has been computed by the time processors 
become idle. To achieve this goal the Immediate Backward pipelined Thomas Algorithm (IB-PTA) is devel- 
oped in this article. The backward step is computed immediately after the forward step has been completed 
for the first portion of lines. This enables the completion of the Thomas algorithm for some of these lines be- 
fore processors become idle. An algorithm for generating a static processor schedule recursively is developed. 
This schedule is used to switch between forward and backward computations and to control communications 
between processors. The advantage of the IB-PTA over the basic PTA is the presence of solved lines, which 
are available for other computations, by the time processors become idle. 
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Rubinstein, Robert, and Ye Zhou: Effects of helicity on Lagrangian and Eulerian time correlations in tur- 
bulence. ICASE Report No. 98-49 , (NAS A/CR- 1998-208737), November 5, 1998, 10 pages. Submitted to 
Physics of Fluids. 

Taylor series expansions of turbulent time correlation functions are applied to show that helicity influ- 
ences Eulerian time correlations more strongly than Lagrangian time correlations: to second order in time, 
the helicity effect on Lagrangian time correlations vanishes, but the helicity effect on Eulerian time correla- 
tions is nonzero. Fourier analysis shows that the helicity effect on Eulerian time correlations is confined to 
the largest inertial range scales. Some implications for sound radiation by swirling flows are discussed. 

Abarbanel, Saul, Adi Ditkowski, and Amir Yefet: Bounded error schemes for the wave equation on complex 
domains. ICASE Report No. 98-50 , (NASA/CR-1998-208740), November 20, 1998, 17 pages. Submitted to 
IEEE Trans. Antennas Propagat. 

This paper considers the application of the method of boundary penalty terms (“SAT”) to the numerical 
solution of the wave equation on complex shapes with Dirichlet boundary conditions. A theory is developed, 
in a semi-discrete setting, that allows the use of a Cartesian grid on complex geometries, yet maintains the 
order of accuracy with only a linear temporal error-bound. A numerical example, involving the solution of 
Maxwell’s equations inside a 2-D circular wave-guide demonstrates the efficacy of this method in comparison 
to others (e.g., the staggered Yee scheme) - we achieve a decrease of two orders of magnitude in the level of 
the Z/ 2 -error. 

Darmofal, David: Eigenmode analysis of boundary conditions for the one- dimensional preconditioned Eu- 
ler equations . ICASE Report No. 98-51 , (NASA/CR-1998-208741), November 20, 1998, 15 pages. To be 
submitted to AIAA 1999 CFD Conference and SIAM Journal of Numerical Analysis. 

An analysis of the effect of local preconditioning on boundary conditions for the subsonic, one-dimensional 
Euler equations is presented. Decay rates for the eigenmodes of the initial boundary value problem are 
determined for different boundary conditions. Riemann invariant boundary conditions based on the unpre- 
conditioned Euler equations are shown to be reflective with preconditioning, and, at low Mach numbers, 
disturbances do not decay. Other boundary conditions are investigated which are non- reflective with pre- 
conditioning and numerical results are presented confirming the analysis. 

Tsynkov, Semyon, Saul Abarbanel, Jan Nordstrom, Viktor Ryaben’kii, and Veer Vatsa: Global artificial 
boundary conditions for computation of external flow problems with propulsive jets. ICASE Report No. 98-52, 
(NASA/CR-1998-208746), December 3, 1998, 25 pages. Submitted to the 14th AIAA CFD Conference. 

We propose new global artificial boundary conditions (ABC’s) for computation of flows with propulsive 
jets. The algorithm is based on application of the difference potentials method (DPM). Previously, similar 
boundary conditions have been implemented for calculation of external compressible viscous flows around 
finite bodies. The proposed modification substantially extends the applicability range of the DPM-based 
algorithm. In the paper, we present the general formulation of the problem, describe our numerical method- 
ology, and discuss the corresponding computational results. The particular configuration that we analyze is 
a slender three-dimensional body with boat-tail geometry and supersonic jet exhaust in a subsonic external 
flow under zero angle of attack. Similarly to the results obtained earlier for the flows around airfoils and 
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wings, current results for the jet flow case corroborate the superiority of the DPM- based ABC’s over standard 
local methodologies from the standpoints of accuracy, overall numerical performance, and robustness. 

Xu, Kun: Gas-kinetic theory based flux splitting method for ideal magnetohydrodynamics. ICASE Report No. 
98-53 , (NASA/1998-208747), December 3, 1998, 22 pages. To be submitted to the Journal of Computational 
Physics. 

A gas-kinetic solver is developed for the ideal magnetohydrodynamics (MHD) equations. The new 
scheme is based on the direct splitting of the flux function of the MHD equations with the inclusion of 
“particle” collisions in the transport process. Consequently, the artificial dissipation in the new scheme is 
much reduced in comparison with the MHD Flux Vector Splitting Scheme. At the same time, the new 
scheme is compared with the well-developed Roe- type MHD solver. It is concluded that the kinetic MHD 
scheme is more robust and efficient than the Roe-type method, and the accuracy is competitive. In this paper 
the' general principle of splitting the macroscopic flux function based on the gas-kinetic theory is presented. 
The flux construction strategy may shed some light on the possible modification of AUSM- and CUSP- type 
schemes for the compressible Euler equations, as well as to the development of new schemes for a non-strictly 
hyperbolic system. 

Holt, Maurice: 3D characteristics . ICASE Report No. 98-54 , (NASA/CR- 1998-208958), December 23, 1998, 
10 pages. Submitted to Springer Series in Computational Physics. 

Contributions to the Method of Characteristics in Three Dimensions, which previously received incom- 
plete recognition, are reviewed. They mostly follow from a fundamental paper by Rusanov which led to 
several developments in Russia, described by Chushkin. 

Lian, Yongsheng, and Kun Xu: A gas-kinetic scheme for reactive flows. ICASE Report No. 98-55 , (NASA/CR- 
1998-208963), December 23, 1998, 16 pages. To be submitted to Computers and Fluids. 

In this paper, the gas-kinetic BGK scheme for the compressible flow equations is extended to chemical 
reactive flow. The mass fraction of the unburnt gas is implemented into the gas kinetic equation by assigning 
a new internal degree of freedom to the particle distribution function. The new variable can be also used to 
describe fluid trajectory for the nonreactive flows. Due to the gas-kinetic BGK model, the current scheme 
basically solves the Navier- Stokes chemical reactive flow equations. Numerical tests validate the accuracy 
and robustness of the current kinetic method. 

Xu, Kun, and Shiu-Hong Lui: Rayleigh-Benard simulation using gas-kinetic BGK scheme in the incompress- 
ible limit. ICASE Report No. 98-56 , (NASA/CR-1998-208964), December 23, 1998, 19 pages. Submitted to 
Physical Review E. 

In this paper, a gas-kinetic BGK model is constructed for the Rayleigh-Benard thermal convection in 
the incompressible flow limit, where the flow field and temperature field are described by two coupled BGK 
models. Since the collision times and pseudo-temperature in the corresponding BGK models can be different, 
the Prandtl number can be changed to any value instead of a fixed Pr=l in the original BGK model. The 
2D Rayleigh-Benard thermal convection is studied and numerical results are compared with theoretical ones 
as well as other simulation results. 
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Lewis, Robert Michael, Virginia Torczon, and Michael W. Trosset: Why pattern search works . ICASE Report 
No. 98-57 , (N AS A/CR- 1998-208966), December 23, 1998, 17 pages. To appear in Optima, The Mathematical 
Programming Society Newsletter. 

Pattern search methods are a class of direct search methods for nonlinear optimization. Since the 
introduction of the original pattern search methods in the late 1950s and early 1960s, they have remained 
popular with users due to their simplicity and the fact that they work well in practice on a variety of 
problems. More recently, the fact that they are provably convergent has generated renewed interest in the 
nonlinear programming community. The purpose of this article is to describe what pattern search methods 
are and why they work. 

Kumfert, Gary, and Alex Pothen; An object-oriented collection of minimum degree algorithms: Design , 
implementation, and experiences. ICASE Report No. 99-1 , (NASA/CR- 1999-208977), January 29, 1999, 15 
pages. In Computing in Object-oriented Parallel Environments, Lecture Notes in Computer Science 1505. 

The multiple minimum degree (MMD) algorithm and its variants have enjoyed 20+ years of research 
and progress in generating fill-reducing orderings for sparse, symmetric positive definite matrices. Although 
conceptually simple, efficient implementations of these algorithms are deceptively complex and highly spe- 
cialized. 

In this case study, we present an object-oriented library that implements several recent minimum degree- 
like algorithms. We discuss how object-oriented design forces us to decompose these algorithms in a different 
manner than earlier codes and demonstrate how this impacts the flexibility and efficiency of our C++ 
implementation. We compare the performance of our code against other implementations in C or Fortran. 

Dobrian, Florin, Gary Kumfert, and Alex Pothen: Object-oriented design for sparse direct solvers. ICASE 
Report No. 99-2 , (NASA/CR- 1999-208978), January 20, 1999, 12 pages. In Computing in Object-oriented 
Parallel Environments, Lecture Notes in Computer Science 1505. 

We discuss the object-oriented design of a software package for solving sparse, symmetric systems of 
equations (positive definite and indefinite) by direct methods. At the highest layers, we decouple data 
structure classes from algorithmic classes for flexibility. We describe the important structural and algorithmic 
classes in our design, and discuss the trade-offs we made for high performance. The kernels at the lower 
layers were optimized by hand. Our results show no performance loss from our object-oriented design, while 
providing flexibility, ease of use, and extensibility over solvers using procedural design. 

Cleaveland, Ranee, Gerald Liittgen, and V, Natarajan: Priority in process algebras . ICASE Report No. 99-3, 
(NASA/CR-1999-208979), January 25, 1999, 48 pages. To appear in Handbook of Process Algebras. 

This paper surveys the semantic ramifications of extending traditional process algebras with notions of 
priority that allow for some transitions to be given precedence over others. These enriched formalisms allow 
one to model system features such as interrupts, prioritized choice, or real-time behavior. 

Approaches to priority in process algebras can be classified according to whether the induced notion of 
pre-emption on transitions is global or local and whether priorities are static or dynamic. Early work in the 
area concentrated on global pre-emption and static priorities and led to formalisms for modeling interrupts 
and aspects of real-time, such as maximal progress, in centralized computing environments. More recent 
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research has investigated localized notions of pre-emption in which the distribution of systems is taken into 
account, as well as dynamic priority approaches, i.c., those where priority values may change as systems 
evolve. The latter allows one to model behavioral phenomena such as scheduling algorithms and also enables 
the efficient encoding of real-time semantics. 

Technically, this paper studies the different models of priorities by presenting extensions of Milner’s 
Calculus of Communicating Systems (CCS) with static and dynamic priority as well as with notions of 
global and local pre-emption. In each case the operational semantics of CCS is modified appropriately, 
behavioral theories based on strong and weak bisimulation are given, and related approaches for different 
process- algebraic settings are discussed. 

Luttgen, Gerald, Girish Bhat, and Ranee Cleaveland: A practical approach to implementing real-time se- 
mantics . ICASE Report No. 99-4 , (NASA/CR-1999-208980), January 25, 1999, 33 pages. To appear in 
Annals of Software Engineering. 

This paper investigates implementations of process algebras which are suitable for modeling concurrent 
real-time systems. It suggests an approach for efficiently implementing real-time semantics using dynamic 
priorities. For this purpose a process algebra with dynamic priority is defined, whose semantics corresponds 
one-to-one to traditional real-time semantics. The advantage of the dynamic- priority approach is that it 
drastically reduces the state-space sizes of the systems in question while preserving all properties of their 
functional and real-time behavior. 

The utility of the technique is demonstrated by a case study which deals with the formal modeling 
and verification of the SCSI-2 bus-protocol. The case study is carried out in the Concurrency Workbench 
of North Carolina, an automated verification tool in which the process algebra with dynamic priority is 
implemented. It turns out that the state space of the bus-protocol model is about an order of magnitude 
smaller than the one resulting from real-time semantics. The accuracy of the model is proved by applying 
model checking for verifying several mandatory properties of the bus protocol. 

Lui, Shiuhong, and Kun Xu: Entropy analysis of kinetic flux vector splitting schemes for the compressible 
Euler equations . ICASE Report No. 99-5 , (NASA/CR-1999-208981), January 29, 1999, 18 pages. 

Flux Vector Splitting (FVS) scheme is one group of approximate Riemann solvers for the compressible 
Euler equations. In this paper, the discretized entropy condition of the Kinetic Flux Vector Splitting (KFVS) 
scheme based on the gas- kinetic theory is proved. The proof of the entropy condition involves the entropy 
definition difference between the distinguishable and indistinguishable particles. 

Xu, Kun: Gas evolution dynamics in Godunov-type schemes and analysis of numerical shock instability . 
ICASE Report No. 99-6 , (NASA/CR-1999-208985), January 28, 1999, 21 pages. To be submitted to the 
International Journal of Numerical Methods in Fluids. 

In this paper we are going to study the gas evolution dynamics of the exact and approximate Riemann 
solvers, e.g., the Flux Vector Splitting (FVS) and the Flux Difference Splitting (FDS) schemes. Since the 
FVS scheme and the Kinetic Flux Vector Splitting (KFVS) scheme have the same physical mechanism and 
similar flux function, based on the analysis of the discretized KFVS scheme the weakness and advantage of 
the FVS scheme are closely observed. The subtle dissipative mechanism of the Godunov method in the 2D 
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case is also analyzed, and the physical reason for shock instability, i.e., carbuncle phenomena and odd-even 
decoupling, is presented. 

Rubinstein, Robert: Double resonance and spectral scaling in the weak turbulence theory of rotating and strat- 
ified turbulence. ICASE Report No. 99-7 , (NASA/CR-1999- 208996), February 5, 1999, 19 pages. Submitted 
to Physical Review E. 

In rotating turbulence, stably stratified turbulence, and in rotating stratified turbulence, heuristic ar- 
guments concerning the turbulent time scale suggest that the inertial range energy spectrum scales as k ~ 2 . 
Prom the viewpoint of weak turbulence theory, there are three possibilities which might invalidate these ar- 
guments: four- wave interactions could dominate three- waye interactions leading to a modified inertial range 
energy balance, double resonances could alter the time scale, and the energy flux integral might not converge. 
It is shown that although double resonances exist in all of these problems, they do not influence overall en- 
ergy transfer. However, the resonance conditions cause the flux integral for rotating turbulence to diverge 
logarithmically when evaluated for a. k 2 energy spectrum; therefore, this spectrum requires logarithmic 
corrections. Finally, the role of four-wave interactions is briefly discussed. 

Rubinstein, Robert, and Ye Zhou: The dissipation range in rotating turbulence. ICASE Report No. 99-8, 
(NASA/CR- 1999-208997), February 5, 1999, 11 pages. Submitted to Physical Review E. 

The dissipation range energy balance of the direct interaction approximation is applied to rotating 
turbulence when rotation effects persist well into the dissipation range. Assuming that RoRe 1//2 << 1 
and that three- wave interactions are dominant, the dissipation range is found to be concentrated in the 
wave vector plane perpendicular to the rotation axis. This conclusion is consistent with previous analyses 
of inertial range energy transfer in rotating turbulence, which predict the accumulation of energy in those 
scales. 

Mavriplis, Dimitri J., and S. Pirzadeh: Large-scale parallel unstructured mesh computations for 3D high-lift 
analysis. ICASE Report No. 99-9 , (NASA/CR- 1999-208999), February 11, 1999, 26 pages. Submitted to 
AIAA Journal of Aircraft. 

A complete “geometry to drag-polar” analysis capability for the three-dimensional high-lift configurations 
is described. The approach is based on the use of unstructured meshes in order to enable rapid turnaround 
for complicated geometries that arise in high- lift configurations. Special attention is devoted to creating a 
capability for enabling analyses on highly resolved grids. Unstructured meshes of several million vertices are 
initially generated on a work-station, and subsequently refined on a supercomputer. The flow is solved on 
these refined meshes on large parallel computers using an unstructured agglomeration multigrid algorithm. 
Good prediction of lift and drag throughout the range of incidences is demonstrated on a transport take-off 
configuration using up to 24.7 million grid points. The feasibility of using this approach in a production 
environment on existing parallel machines is demonstrated, as well as the scalability of the solver on machines 
using up to 1450 processors. 
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Xu, Kun, and Li-Shi Luo: Connection between the lattice Boltzmann equation and the beam scheme. ICASE 
Report No. 99-10 , (NASA/CR- 1999-209001), February 12, 1999, 14 pages. Submitted to Physical Review 
E. 


In this paper we analyze and compare the lattice Boltzmann equation with the beam scheme in details. 
We notice the similarity and differences between the lattice Boltzmann equation and the beam scheme. We 
show that the accuracy of the lattice Boltzmann equation is indeed second order in space. We discuss the 
advantages and limitations of lattice Boltzmann equation and the beam scheme. Based on our analysis, we 
propose an improved multi- dimensional beam scheme. 

Baggag, Abdelkader, Harold Atkins, Can Ozturan, and David Keyes: Parallelization of an object-oriented 
unstructured aeroacoustics solver. ICASE Report No. 99-11, (NASA/CR-1999-209098), February 16, 1999, 
16 pages. Submitted to the Proceedings of the 9th SIAM Conference on Parallel Processing for Scientific 
Computing. 

A computational aeroacoustics code based on the discontinuous Galerkin method is ported to several 
parallel platforms using MPI. The discontinuous Galerkin method is a compact high-order method that 
retains its accuracy and robustness on non-smooth unstructured meshes. In its semi-discrete form, the 
discontinuous Galerkin method can be combined with explicit time marching methods making it well suited 
to time accurate computations. The compact nature of the discontinuous Galerkin method also makes it 
well suited for distributed memory parallel platforms. The original serial code was written using an object- 
oriented approach and was previously optimized for cache-based machines. The port to parallel platforms 
was achieved simply by treating partition boundaries as a type of boundary condition. Code modifications 
were minimal because boundary conditions were abstractions in the original program. Scalability results 
are presented for the SGI Origin, IBM SP2, and clusters of SGI and Sun workstations. Slightly superlinear 
speedup is achieved on a fixed-size problem on the Origin, due to cache effects. 

Arian, E., A. Batterman, and E.W. Sachs: Approximation of the Newton step by a defect correction process. 
ICASE Report No. 99-12 , (NASA/CR-1999-209099), February 16, 1999, 35 pages. To be submitted to SIAM 
Journal of Optimization. 

In this paper, an optimal control problem governed by a partial differential equation is considered. The 
Newton step for this system can be computed by solving a coupled system of equations. To do this efficiently 
with an iterative defect correction process, a modifying operator is introduced into the system. This operator 
is motivated by local mode analysis. The operator can be used also for preconditioning in GMRES. We give 
a detailed convergence analysis for the defect correction process and show the derivation of the modifying 
operator. Numerical tests are done on the small disturbance shape optimization problem in two dimensions 
for the defect correction process and for GMRES. 

Rubinstein, Robert, and Aaron H. Auslender: Relaxation from steady states far from equilibrium and the 
persistence of anomalous shock behavior in weakly ionized gases. ICASE Report No. 99-13 , (NASA/CR- 
1999-209105), March 3, 1999, 13 pages. To be submitted to Physical Review E. 

The decay of anomalous effects on shock waves in weakly ionized gases following plasma generator 
extinction has been measured in the anticipation that the decay time must correlate well with the relaxation 
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time of the mechanism responsible for the anomalous effects. When the relaxation times cannot be measured 
directly, they are inferred theoretically, usually assuming that the initial state is nearly in thermal equilibrium. 
In this paper , it is demonstrated that relaxation from any steady state far from equilibrium, including the 
state of a weakly ionized gas, can proceed much more slowly than arguments based on relaxation from 
near equilibrium states might suggest. This result justifies a more careful analysis of the relaxation times 
in weakly ionized gases and suggests that although the experimental measurements of relaxation times did 
not lead to an unambiguous conclusion, this approach to understanding the anomalous effects may warrant 
further investigation. 

Diskin, Boris, and James L. Thomas: Solving upwind-biased discretizations: Defect- correction iterations . 
ICASE Report No. 99-14 , (NASA/CR-1999-209106), March 3, 1999, 27 pages. To be submitted to the 
SIAM Journal of Scientific Computing. 

This paper considers defect-correction solvers for a second order upwind-biased discretization of the 2D 
convection equation. The following important features are reported 

1. The asymptotic convergence rate is about 0.5 per defect-correction iteration. 

2. If the operators involved in defect- correction iterations have different approximation order, then 
the initial convergence rates may be very slow. The number of iterations required to get into the 
asymptotic convergence regime might grow on fine grids as a negative power of h. In the case of a 
second order target operator and a first order driver operator, this number of iterations is roughly 
proportional to h -1 / 3 . 

3. If both the operators have the second approximation order, the defect-correction solver demonstrates 
the asymptotic convergence rate after three iterations at most. The same three iterations are required 
to converge algebraic error below the truncation error level. 

A novel comprehensive half-space Fourier mode analysis (which, by the way, can take into account the 
influence of discretized outflow boundary conditions as well) for the defect-correction method is developed. 
This analysis explains many phenomena observed in solving non-elliptic equations and provides a close 
prediction of the actual solution behavior. It predicts the convergence rate for each iteration and the 
asymptotic convergence rate. As a result of this analysis, a new very efficient adaptive multigrid algorithm 
solving the discrete problem to within a given accuracy is proposed. Numerical simulations confirm the 
accuracy of the analysis and the efficiency of the proposed algorithm. The results of the numerical tests are 
reported. 
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INTERIM REPORTS 


Bokhari, Shahid H., and Dimitri J. Mavriplis: The Tera multithreaded architecture and unstructured meshes. 
ICASE Interim Report No. 33 , (NASA/CR-1998-208953), December 11, 1998, 23 pages. 

The Tera Multithreaded Architecture (MTA) is a new parallel supercomputer currently being installed 
at San Diego Supercomputing Center (SDSC). This machine has an architecture quite different from contem- 
porary parallel machines. The computational processor is a custom design and the machine uses hardware 
to support very line grained multithreading. The main memory is shared, hardware randomized and flat. 
These features make the machine highly suited to the execution of unstructured mesh problems, which are 
difficult to parallelize on other architectures. 

We report the results of a study carried out during July- August 1998 to evaluate the execution of EUL3D, 
a code that solves the Euler equations on an unstructured mesh, on the 2 processor Tera MTA at SDSC. 

Our investigation shows that parallelization of an unstructured code is extremely easy on the Tera. We 
were able to get an existing parallel code (designed for a shared memory machine), running on the Tera 
by changing only the compiler directives. Furthermore, a serial version of this code was compiled to run 
in parallel on the Tera by judicious use of directives to invoke the “full/empty” tag bits of the machine to 
obtain synchronization. This version achieves 212 and 406 Mflop/s on one and two processors respectively, 
and requires no attention to partitioning or placement of data- issues that would be of paramount importance 
in other parallel architectures. 


Yefet, A., and E. Turkel: Construction of three dimensional solutions for the Maxwell equations. IC ASE 
Interim Report No. 34, (N AS A/CR- 1998-208954), December 11, 1998, 10 pages. 


We consider numerical solutions for the three dimensional time dependent Maxwell equations. We 
construct a fourth order accurate compact implicit scheme and compare it to the Yee scheme for free space 
in a box. 


Trosset, Michael W.: The Krigifier: A procedure for generating pseudorandom nonlinear objective functions 
for computational experimentation. ICASE Interim Report No. 35, (NASA/CR- 1999-209000), February 11, 
1999, 11 pages. 

Comprehensive computational experiments to assess the performance of algorithms for numerical opti- 
mization require (among other things) a practical procedure for generating pseudorandom nonlinear objective 
functions. We propose a procedure that is based on the convenient fiction that objective functions are real- 
izations of stochastic processes. This report details the calculations necessary to implement our procedure 
for the case of certain stationary Gaussian processes and presents a specific implementation in the statistical 
programming language S-PLUS. 
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